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Abstract 

The  time  complexity  of  wait-free  algorithms  in  “normal”  executions,  where  no  failures  occur 
and  processes  operate  at  approximately  the  same  speed,  is  considered.  A  lower  bound  of  log  n 
on  the  time  complexity  of  any  wait-free  algorithm  that  achieves  approximate  agreement  among 
n  processes  is  proved.  In  contrast,  there  exists  a  non- wait-free  algorithm  that  solves  this 
problem  in  constant  time.  This  implies  an  Q(log  n)  time  separation  between  the  wait-free  and 
non-wait-free  computation  models.  On  the  positive  side,  we  present  an  O(log  n)  time  wait-free 
approximate  agreement  algorithm;  the  complexity  of  this  algorithm  is  within  a  small  constant 
of  the  lower  bound. 


1  Introduction 


In  shared-memory  distributed  systems,  some  number  n  of  independent  asynchronous  processes 
communicate  by  reading  and  writing  to  shared  memory.  In  such  a  computing  environment,  it  is 
possible  for  processes  to  operate  at  very  different  speeds,  e.g.,  because  of  implementation  issues 
such  as  communication  and  memory  latency,  priority- based  time-sharing  of  processors,  cache 
misses  and  page  faults.  It  is  also  possible  for  processes  to  fail  entirely.  Wait- free  algorithms 
have  been  proposed  as  a  mechanism  for  computing  in  the  face  of  variable  speeds  and  failures:  a 
wait-free  algorithm  guarantees  that  each  nonfaulty  process  terminates  regardless  of  the  speed 
and  failure  of  other  processes  ([23,  28]). 1  The  design  of  wait-free  algorithms  has  been  a  very 
active  area  of  research  recently  (see,  e.g.,  [1,  2,  4,  14,  23,  28,  29,  32,  42,  43,  45,  48]). 

Because  wait-free  algorithms  guarantee  that  fast  processes  terminate  without  waiting  for 
slow  processes,  wait-free  algorithms  seem  to  be  generally  thought  of  as  fast.  However,  while 
it  is  obvious  from  the  definition  that  wait-free  algorithms  are  highly  resilient  to  failures,  we 
believe  that  the  assumption  that  such  algorithms  are  fast  requires  more  careful  examination. 

We  study  the  time  complexity  of  wait-free  and  non-wait-free  algorithms  in  “normal”  exe¬ 
cutions,  where  no  failures  occur  and  processes  operate  at  approximately  the  same  speed.  We 
select  this  particular  subset  of  the  executions  for  making  the  comparison,  because  it  is  only 
reasonable  to  compare  the  behavior  of  the  algorithms  in  cases  where  both  are  required  to 
terminate.  Since  wait-free  algorithms  terminate  even  when  some  processes  fail,  while  non¬ 
wait-free  algorithms  may  fail  to  terminate  in  this  case,  the  comparison  should  only  be  made  in 
executions  in  which  no  process  fails,  i.e.,  in  failure-free,  executions.  The  time  measure  we  use 
is  the  one  introduced  in  [26,  27],  and  used  to  evaluate  the  time  complexity  of  asynchronous 
algorithms,  in,  e.g.,  [3,  12,  34,  35,  44].  To  summarize,  we  are  interested  in  measuring  the  time 
cost  imposed  by  the  wait-free  property,  as  measured  in  terms  of  extra  computation  time  in  the 
most  normal  (failure-free)  case. 

In  this  paper,  we  address  the  general  question  by  considering  a  specific  problem — the  ap¬ 
proximate  agreement  problem  studied,  for  example,  in  [15,  19,  20,  36];  we  study  this  problem 
in  the  context  of  a  particular  shared-memory  primitive — single-writer  multi-reader  atomic  reg¬ 
isters.  In  this  problem,  each  process  starts  with  a  real-valued  input,  and  (provided  it  does  not 
fail)  must  eventually  produce  a  real-valued  output.  The  outputs  must  all  be  within  a  given 
distance  e  of  each  other,  and  must  be  included  within  the  range  of  the  inputs.  This  problem, 
a  weaker  variant  of  the  well-studied  problem  of  distributed  consensus  (e.g.,  [21,  30]),  is  closely 
related  to  the  important  problem  of  synchronizing  local  clocks  in  a  distributed  system. 

Approximate  agreement  can  be  achieved  very  easily  if  waiting  is  allowed,  by  having  a 
designated  process  write  its  input  to  the  shared  memory;  all  other  processes  wait  for  this 
value  to  be  written  and  adopt  it  as  their  outputs.  In  terms  of  the  time  measure  described 
above,  it  is  easy  to  see  that  the  time  complexity  of  this  algorithm  is  constant — independent 

‘Wait-free  is  the  shared-memory  analogue  of  the  non-blocking  property  for  synchronous  transaction  systems 
(cf.  [10,  47]). 
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of  n,  the  range  of  inputs  and  e.  On  the  other  hand,  there  is  a  relatively  simple  wait-free 
algorithm  for  this  problem,  which  we  describe  in  Section  3,  and  which  is  based  on  successive 
averaging  of  intermediate  values.  The  time  complexity  of  this  algorithm  depends  linearly  on 
ft'  and  logarithmically  on  the  size  of  the  range  of  input  values  and  on  1/e.  A  natural  question 
to  ask  is  whether  the  time  complexity  of  this  algorithm  is  optimal  for  wait-free  approximate 
agreement  algorithms. 

Our  first  major  result  is  an  algorithm  for  the  special  case  where  n  =  2,  whose  time  com¬ 
plexity  is  constant,  i.e.,  it  does  not  depend  on  the  range  of  inputs  or  on  e  (Section  5).  The 
algorithm  uses  a  novel  method  of  overcoming  the  uncertainty  that  is  inherent  in  an  asyn¬ 
chronous  environment,  without  resorting  to  synchronization  points  (cf.  [22])  or  other  waiting 
mechanisms  (cf.  [12]):  this  method  involves  ensuring  that  the  two  processes  base  their  decisions 
on  information  that  is  approximately,  but  not  exactly,  the  same. 

Next,  using  a  powerful  technique  of  integrating  wait-free  (but  slow)  and  non- wait-free  (but 
fast)  algorithms,  together  with  an  O(logn)  wait-free  input  collection  function,  we  generalize  the 
key  ideas  of  the  2- process  algorithm  to  obtain  our  second  major  result:  a  wait-free  algorithm 
for  approximate  agreement  whose  time  complexity  is  O(logn)  (Section  6).  Thus,  the  time 
complexity  of  this  algorithm  does  not  depend  on  either  the  size  of  the  range  of  input  values  or 
on  e,  but  it  still  depends  on  n,  the  number  of  processes. 

At  this  point,  it  is  natural  to  ask  whether  the  logarithmic  dependence  on  n  is  inherent 
for  wait-free  approximate  agreement  algorithms,  or  whether,  on  the  other  hand,  there  is  a 
constant-time  wait-free  algorithm  (independent  of  n).  Our  third  major  result  shows  tha*  the 
logn  dependency  is  inherent:  any  wait-free  algorithm  for  approximate  agreement  has  ime 
complexity  at  least  log  n  (Section  7).3  This  implies  an  fi(log  n)  time  separation  between  the 
non-wait-free  and  wait-free  computation  models. 

We  note  that  the  constant-time  2-process  algorithm  behaves  rather  badly  if  one  of  the 
processes  fails.  The  work  performed  in  an  execution  of  an  algorithm  is  the  total  number  of 
atomic  operations  performed  in  that  execution  by  all  processes  before  they  decide.  We  present  a 
tradeoff  between  the  time  complexity  of  and  the  work  performed  by  any  wait-free  approximate 
agreement  algorithm.  We  show  that  for  any  wait-free  approximate  agreement  algorithm  for  2 
processes,  there  exists  an  execution  in  which  the  work  exhibits  a  nontrivial  dependency  on  e 
and  the  range  of  inputs. 

In  practice,  the  design  of  distributed  systems  is  often  geared  towards  optimizing  the  time 
complexity  in  “normal  executions,”  i.e.,  executions  where  no  failures  occur  and  processes  run  at 
approximately  the  same  pace,  while  building  in  safety  provisions  to  protect  against  failures  (cf. 
[31]).  Our  results  indicate  that,  in  the  asynchronous  shared-memory  setting,  there  are  problems 
for  which  building  in  such  safety  provisions  must  result  in  performance  degradation  in  the 
normal  executions.  This  situation  contrasts  with  that  occurring,  for  example,  in  synchronous 
systems  that  solve  the  distributed  consensus  problem.  In  that  setting,  there  are  early-stopping 
algorithms  (e.g.,  [16, 18, 40])  that  tolerate  failures,  yet  still  terminate  in  constant  time  when  no 

aThe  lower  bound  is  attained  in  an  execution  where  processes  run  synchronously  and  no  process  fails. 
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failures  occur.  The  exact  cost  imposed  by  fault-tolerance  on  normal  executions  was  studied, 
for  example,  in  [9,  18,  40].  For  synchronous  message- passing  systems,  it  has  been  shown  that 
non-blocking  protocols  take  twice  as  much  time,  in  failuTe-free  executions,  as  blocking  protocols 
([10]). 

Recent  work  has  addressed  the  issue  of  adapting  the  usual  synchronous  shared-memory 
PRAM  model  to  better  reflect  implementation  issues,  by  reducing  synchrony  ([12,  13,  22,  41, 
37])  or  by  requiring  fault-tolerance  ([25,  24]).  To  the  best  of  our  knowledge,  the  impact  of 
the  combination  of  asynchrony  and  fault- tolerance  (as  exemplified  by  the  wait-free  model)  on 
the  time  complexity  of  shared- memory  algorithms  has  not  previously  been  studied.  In  [38], 
Martel,  Subramonian  and  Park  present  efficient  fault-tolerant  asynchronous  PRAM  algorithms. 
Their  algorithms  optimize  work  rather  than  time  and  employ  randomization.  Another  major 
difference  is  that  they  assume  that  inputs  are  stored  in  the  shared  memory,  so  that  every 
process  can  access  the  input  of  every  other  process. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  present  formal  definitions  of  the 
systems  considered  in  this  paper  and  introduce  the  time  measure.  The  approximate  agreement 
problem  is  defined  in  Section  3,  where  we  also  present  a  fast  non-wait-free  algorithm  and  a 
slow  wait-free  algorithm  for  reaching  approximate  agreement.  Section  4  introduces  a  “bias”- 
function  on  which  the  algorithms  in  the  following  sections  are  based.  Proofs  of  the  various 
properties  of  this  function  are,  to  ease  the  presentation,  deferred  to  Section  9.  A  constant  time 
wait-free  algorithm  for  approximate  agreement  between  two  processes  is  presented  and  proven 
correct  in  Section  5;  key  ideas  from  this  algorithm  are  used  in  the  O(logn)  time  wait-free 
approximate  agreement  algorithm  presented  in  Section  6.  Section  7  contains  the  log  n  time 
lower  bound  for  wait-free  approximate  agreement  algorithms.  Section  8  presents  the  lower 
bound  for  the  tradeoff  between  the  time  complexity  and  the  work  complexity  of  a  wait-free 
algorithm  for  approximate  agreement.  We  conclude,  in  Section  10,  with  a  discussion  of  the 
results  and  directions  for  future  research. 


2  Model  of  Computation  and  Time  Measure 

In  this  section  we  describe  the  systems  and  the  time  measure  we  will  consider.  Our  definitions 
are  standard  and  are  similar  to  the  ones  in,  e.g.,  [3,  23,  28,  33,  34]. 

A  system  consists  of  n  processes  po, . . . ,  pn_i .  Each  process  is  a  deterministic  state  machine, 
with  a  possibly  infinite  number  of  states.  We  associate  with  each  process  a  set  of  local  states. 
Among  the  states  of  each  process  are  a  subset  called  the  initial  states  and  another  subset 
called  the  decision  states.  Processes  communicate  by  reading  and  writing  to  single-tortter 
multi-reader  atomic  registers  Ri,  Ri, . . .  (also  called  shared  variables ).  Each  process  ft  has  two 
atomic  operations  available  to  it  that  operate  on  a  shared  register  R: 

•  xerite(R,v)  writes  the  value  v  to  the  shared  variable  R. 

•  read(R)  reads  the  shared  variable  R  and  returns  its  value  v. 
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A  system  configuration  consists  of  the  states  of  the  processes  and  the  registers.  Formally, 
a  configuration  C  is  a  vector  (so«-  • .,sn-i>  v\,. . .)  where  st-  is  the  local  state  of  process  p,- 
and  Vj  is  the  value  of  the  shared  variable  Rj.  Each  shared  variable  may  attain  values  from 
some  domain  which  includes  a  special  “undefined”  value,  X.  An  initial  configuration  is  a 
configuration  in  which  every  local  state  is  an  initial  state  and  all  shared  variables  are  set  to 
X.  For  a  configuration  C  =  (so, ...,  sn_l5  uj, . . .),  state(pi,  C)  denotes  the  state  of  p,  in  C  and 
val(Rj,C)  denotes  the  value  of  register  Rj  in  C,  i.e.,  state(pi,C)  =  s,  and  val(Rj,C)  =  Vj. 

We  consider  an  interleaving  model  of  concurrency,  where  executions  are  modeled  as  se¬ 
quences  of  steps.  Each  step  is  performed  by  a  single  process.  A  process  pi  performs  either  a 
write(R,  v)  operation  or  a  read(R)  operation  (which  returns  a  value  v),  but  not  both,  performs 
some  local  computation,  and  changes  to  its  next  local  state.  The  next  configuration  is  the 
result  of  these  modifications.  We  assume  that  each  process  pi  follows  a  local  algorithm  Ai  that 
deterministically  determines  p,’s  next  step:  Ai  determines  a  variable  R  and  whether  p,-  is  to 
read  or  write  R  as  a  function  of  p,’s  local  state.  If  p,-  is  to  read  R,  then  A,  determines  pi’s  next 
state  as  a  function  of  p,-’ s  current  state  and  the  value  v  read  from  R.  If  p,-  is  to  write  R,  then 
Ai  determines  p«’s  next  state  and  the  value  v  to  be  written  to  22  as  a  function  of  pi's  current 
state.  An  algorithm  is  a  function  A  mapping  each  t  to  a  local  algorithm  A;  for  p,-. 

An  event  on  p,  is  simply  p^’s  index  i.  A  schedule  is  a  finite  or  infinite  sequence  of  events. 
We  denote  by  A  the  empty  schedule,  with  no  events.  We  denote  the  configuration  resulting 
from  the  application  of  a  finite  schedule  a  to  a  configuration  C  by  Ca.  An  execution  fragment 
starting  from  a  configuration  C  is  a  finite  or  infinite  alternating  sequence  of  configurations  and 
events,  Co,  *1,  C\, . . . ,  C*_i,  i*,, . . .,  where  C  =  Co  and  C*  —  Ck-i*k,  for  all  fc  >  1.  We  assume 
that  a  finite  execution  fragment  aids  with  a  configuration.  The  schedule  associated  with  this 
execution  fragment  is  ij,  . ..  Conversely,  the  (unique)  execution  fragment  resulting  from 

applying  a  schedule  a  to  a  configuration  C  is  denoted  by  (C,  <r).  An  execution  is  an  execution 
fragment  starting  with  an  initial  configuration. 

Given  an  infinite  schedule  a,  a  process  is  faulty  in  <r  if  it  takes  a  finite  number  of  steps 
(i.e.,  has  a  finite  number  of  events)  in  a,  and  nonfaulty  otherwise.  An  infinite  schedule  a 
is  f -admissible  if  at  most  /  processes  are  faulty  in  a.  In  particular,  a  O-admissible  schedule 
is  called  failure-free.  These  definitions  also  apply  to  execution  fragments  by  means  of  their 
associated  schedules. 

Let  2  be  a  fixed  input  domain  and  V  be  a  fixed  decision  domain.  Each  initial  state  of  p,-  is 
associated  with  an  input  value  in  2.  For  each  process  pi  and  d  €  V  we  define  a  subset,  D^, 
of  the  states  of  p,-.  We  assume  that  for  each  p,,  the  sets  are  pairwise  disjoint.  We  also 
assume  that  decisions  are  irrevocable,  i.e.,  the  algorithm  transitions  are  such  that  if  p,-  is  in  a 
state  of  Di,d  it  will  remain  in  a  state  of  D,^.  We  call  the  set  25,^  the  d-decision  states  of  p,-. 

A  decision  problem  (or  just  problem)  II  of  size  n,  is  a  relation  between  2n  and  Vn.  An 
algorithm  / -solves  a  decision  problem  II  if  in  all  executions  the  decisions  made  can  be  completed 
to  a  decision  vector  that  is  in  the  relation  II  to  the  inputs  of  the  processes.  Furthermore,  in 
any  /-admissible  execution,  every  nonfaulty  process  eventually  decides.  An  algorithm  that 
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(n  —  l)-solves  a  problem  II  is  also  called  a  wait-free  algorithm  for  II.  Intuitively,  even  if  all 
processes  but  one  fail  when  a  wait-free  algorithm  is  executed,  this  process  eventually  decides. 

We  now  define  how  to  measure  the  time  an  execution  takes.3  We  assign  times  to  events  in 
a  schedule  subject  to  the  following  constraints:  (a)  the  time  assigned  to  the  first  event  of  any 
process  is  at  most  1,  and  (b)  the  time  between  two  events  of  the  same  process  is  at  most  1. 
The  time  of  a  finite  schedule  a  is  the  largest  amount  of  real  time  that  can  be  assigned  to  the 
last  event  in  the  schedule;  denote  this  by  time(o).  The  time  between  two  events  in  a  schedule 
is  the  largest  amount  of  real  time  that  can  elapse  between  these  two  events  under  any  time 
assignment  to  this  schedule.  We  define  the  time  taken  by  an  execution  to  be  the  time  taken 
by  the  associated  schedule.  (This  definition  follows  [34,  44].) 

An  equivalent  definition  (cf.  [3])  is  obtained  by  externally  partitioning  the  computation 
into  minimal  rounds:  a  round  is  any  sequence  of  events  such  that  every  process  takes  a  step  at 
least  once  in  the  sequence.  A  minimal  round  is  a  round  such  that  no  proper  prefix  of  it  is  a 
round.  Every  sequence  of  events  can  be  uniquely  partitioned  into  minimal  rounds.4  The  time 
for  an  execution  is  defined  to  be  the  number  of  segments  in  the  unique  partition  into  minimal 
rounds.  (This  is  the  definition  introduced  in  [26,  27],  called  the  round  complexity  in  [12].) 

The  running  time  for  p,-  in  an  execution  of  an  algorithm  A  is  defined  to  be  the  time 
associated  with  the  shortest  finite  prefix  of  this  execution  in  which  pi  is  in  a  decision  state 
(oo,  if  there  is  no  such  prefix).  The  time  complexity  of  an  algorithm  A  is  the  supremum  of  the 
running  times  over  all  failure-free  executions  of  A  and  all  processes  p,-. 

We  conclude  this  section  with  some  useful  notation.  Let  A  be  a  set  of  real  numbers. 
Define  ranye(A)  to  be  the  interval  [minxex  *»max*gx  x],  if  X  is  nonempty  and  0,  otherwise. 
Define  diam(X)  to  be  maxXltXa€x  |*i  -  *a|»  if  X  is  nonempty  and  0,  otherwise.  Note  that  if 
X  is  nonempty  then  diam(X)  is  the  length  of  the  interval  range(X).  If  A  is  nonempty,  then 
mid(A)  =  2!Bi£X*+l2£St£X*' 

3  Basic  Solutions  to  the  Approximate  Agreement  Problem 

We  start  by  defining  the  approximate  agreement  problem  and  describing  non-wait-free  and 
wait-free  algorithms  to  solve  it.  In  the  approximate  agreement  problem,  processes  start  with 
real-valued  inputs,  *o>  •  •  ■  &nd  a  constant  e  >  0  (the  same  e  for  all  processes);  all 

nonfaulty  processes  are  required  to  decide  on  real- valued  outputs  yo> •  •  !/n-i>  such  that  the 
following  conditions  hold: 

Agreement:  for  any  i,  j,  |yt-  -  yf\  <  e,  and 
Validity:  for  any  *,  y,  €  range({xo, . . .,x„_i}). 

3 These  definitions  can  also  be  formalised  in  the  timed  automaton  model  ([39,  6]) 

4  Except,  poaeibly,  for  the  last  segment. 
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function  wait-approx(x);  function  wait-approx(x); 

begin  begin 

1:  Vo  :=  x;  1:  repeat  until  Vo  #  X;  /*  wait  */ 

2:  return  x;  2:  return  Vo ; 

end;  end; 

Process  po  Process  py,  t  ^  0 

Figure  1:  Fast  non-wait-free  n-process  approximate  agreement. 

This  problem  has  a  simple  0(1)  time  non- wait-free  solution,  described  in  Figure  1.  Process 
po  maintains  a  single-writer  multi-reader  atomic  register,  Vo,  to  which  it  writes  its  input  value 
as  soon  as  it  starts  the  algorithm.  All  processes  wait  until  V0  is  set  to  a  value  that  is  not  X  and 
decide  on  this  value.  In  the  code,  any  assignment  to  a  shared  variable  implies  a  write,  and  a 
reference  to  the  value  of  a  shared  variable  implies  a  read.  Upper  case  variables  denote  shared 
variables,  while  all  lower  case  variables  are  local.  In  this  algorithm,  the  values  returned  in  the 
return  statements  are  the  decision  values.  Later  in  the  paper,  we  will  use  this  algorithm  as 
a  Subroutine”  in  our  main  algorithm;  then  the  values  returned  in  the  return  statements  will 
not  be  the  final  decision  values.  Similar  conventions  hold  for  later  algorithms  in  the  paper. 
We  have: 

Theorem  S.l  Procedure  wait-approK  is  a  non-wait-free  algorithm  for  tke  approximate  agree¬ 
ment  probiem  whose  running  time  is  0(1). 

We  next  present  a  wait-free  algorithm  for  approximate  agreement.  In  addition  to  demon¬ 
strating  that  a  wait -free  solution  exists  for  this  problem,  this  algorithm  will  also  be  used  as  a 
“budding  block”  in  the  construction  of  a  more  efficient  algorithm,  in  Section  6. 

Let  us  begin  by  outlining  a  simple  variant  of  the  algorithm  for  the  case  of  two  processes. 
Each  of  the  processes  py,  i  €  {0, 1}  has  a  register  which  it  can  write  and  the  other  can  read. 
Here  and  elsewhere,  we  let  *  denote  the  index  of  the  other  process,  i.e.,  f  =  1  -  i.  Due  to  the 
asynchrony  in  the  system,  it  is  impossible  to  have  processes  agree  on  one  of  the  input  values 
(see  [17,  21,  33]).  Thus,  our  algorithm  has  them  gradually  converge  from  the  input  values  xo 
and  x\  to  values  that  are  only  €  apart.  A  process  py  repeatedly  does  the  following:  It  writes  its 
value  Vi  (initially  the  input  value  xy)  into  its  register,  and  then  reads  pr’s  register.  If  p,  reads 
X  from  «r,  it  must  decide  on  its  own  value,  since  it  can  never  know  when  pr  will  write  its  input 
value  (if  at  all,  because  pr  could  have  failed  before  writing).  If  p,  reads  a  non-X  value  from  t*, 
it  checks  whether  or  not  It*  -  t>y|  <  e.  If  it  is,  py  decides  on  its  own  value.  If  not,  py  sets  t>,  to 
be  ^*^2*  and  repeats. 

Due  to  asynchrony,  processes  do  not  necessarily  converge  “directly”  to  a  value.  Rather,  the 
following  type  of  scenario  is  possible:  pr,  having  formerly  written  t*,  reads  pi’s  current  value 

and  is  delayed  just  before  writing  to  its  register;  then  p,  repeatedly  reads  and  writes, 
cutting  the  interval  in  half  till  its  value  is  very  dose  to  t*;  finally,  p*  completes  the  write  of 


to  its  register,  so  that  in  fact,  p,  moved  “too  far”  towards  pr’s  old  value.  This  can  repeat 
itself  again  and  again.  However,  in  every  such  step  of  0(1)  time  (in  which  both  p;  and  pT 
perform  a  read  and  a  write),  the  diameter  of  the  proposed  values,  |u,  -  t>f|,  is  cut  by  at  least  a 
half,  and  so  the  values  converge  in  0(log(£^))  time.  The  algorithm  is  wait-free,  since  each 
process  can  reach  a  decision  independently  of  the  other  taking  steps. 

The  algorithm  for  n  >  2  processes  is  of  the  same  flavor,  but  uses  more  complicated  mech¬ 
anisms  to  synchronize  among  processes.  It  uses  ideas  similar  to  those  used  in  the  randomized 
consensus  algorithm  of  [4].  The  computation  proceeds  in  (asynchronous)  phases;  in  each  phase, 
each  process  suggests  a  possible  decision  value.  In  a  manner  similar  to  that  of  the  two  process 
scheme  above,  the  range  of  suggestions  shrinks  by  a  constant  factor  at  each  phase,  until  after 
O(log(  ^lttm({I°^  -»g"-»)) ))  phases  it  becomes  small  enough  to  allow  processes  to  decide.  Because 
there  may  be  more  than  two  processes,  a  problem  may  arise  in  the  case  of  an  execution  in 
which  certain  slow  processes  temporarily  stop  taking  steps  (i.e.  cease  advancing  in  phases), 
while  others  (more  than  one)  continue  to  advance,  and  then  those  slow  processes  return  to 
taking  steps  again.  The  algorithm  must  allow  the  fast  processes  to  coordinate  a  decision,  while 
at  the  same  time  guaranteeing  that  the  ones  that  are  temporarily  slow,  will  converge  to  the 
same  decision  once  they  resume  activity.  The  key  idea  in  achieving  this  task  is  to  allow  fast 
processes  that  have  converged  to  approximately  the  same  suggested  value,  and  are  ahead  of  all 
processes  with  contradictory  suggestions  by  at  least  two  phases,  to  decide.  As  will  be  shown, 
it  can  be  guaranteed  that  the  processes  at  lower  phases  will  join  this  decision  value. 

The  algorithm  appears  in  Figure  2.  The  inputs  to  each  process  p,  are  real  numbers  x<  and 
e.5  For  a  real  number  x,  define  ne(x),  the  e -neighborhood of  x,  to  be  [x-£,  x+e].  The  algorithm 
employs  a  single-writer  atomic  snapshot  object  S  as  a  basic  memory  primitive.  Informally,  this 
is  a  data  structure  partitioned  into  n  segments  S,,  each  of  which  can  be  updated  (written)  by 
one  process,  and  all  of  which  can  be  scanned  (read)  by  any  process  in  one  atomic  operation. 
(More  precise  specifications  and  implementations  of  snapshot  objects  from  single-writer  multi¬ 
reader  atomic  registers  can  be  found  in  [1,  2].)  For  each  process  p,-,  its  segment  of  S  is  an  array 
S<[1..]  that  in  any  state  contains  a  finite  sequence  of  reads  -  its  suggestions  at  different  phases 
-  indexed  by  phase  number.  Initially,  each  sequence  is  A,  the  empty  sequence.  At  each  phase, 
after  updating  (writing)  a  suggestion  to  its  array  (Line  2),  a  process  p;  reads  the  arrays  of  all 
processes  (Line  3),  obtaining  their  suggestions  for  all  phases6.  If  p,  is  at  the  maximum  phase 
and  all  the  suggestions  by  other  processes  for  its  phase,  or  the  phase  before  it,  are  within  £  of 
its  latest  suggestion,  then  pi  decides  on  its  latest  suggestion  (Lines  4-5).  Otherwise  (Line  6), 
Pi  advances  to  the  next  phase  taking  as  its  new  suggestion  the  midpoint  of  all  the  suggestions 
at  the  next  phase  if  there  are  any,  or  of  its  current  phase  if  there  are  none.  Let  us  make  two 
final  remarks  before  proceeding  to  prove  the  algorithm’s  properties.  In  the  algorithm,  since 
a  process  first  writes  to  its  own  sequence  and  then  reads  all  sequences  (including  its  own),  it 
follows  that  phase  <  max-phase.  Also,  note  that  in  Line  6,  r  is  set  to  be  the  number  of  a  phase 
for  which  there  is  at  least  one  suggestion.  Thus,  the  mid  operator  is  applied  to  a  nonempty 

*  Although  e  is  described  as  a  parameter,  it  is  guaranteed  that  all  processes  have  exactly  the  same  value  of  t. 

‘Though  one  can  devise  algorithms  that  do  not  require  a  process  to  maintain  suggestions  for  all  past  phases, 
we  have  chosen  to  do  so  in  order  to  simplify  the  exposition  and  proofs. 
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function  wait-free-approx(x,£); 
begin 

1:  phase  :=  1  ; 

repeat  forever 

2:  update(5,[phose]  :=  x  )  ; 

3:  s  :=  scan(S)  ; 

4:  max-phase  :=  maxo<j<n-i{|«j|}  ;  /*  Note  that  phase  <  max-phase  */ 

5:  if  phase  =  max-phase  and  phase  >  2 

and  3j[r]  €  ne[xj  for  all  j  €  {0, . . . ,  n  -  1}  and  all  r  >  phase  -  1 
then  return  x  ; 

6:  else  r  :=  min  {phase  +  1,  max-phase }  ; 

7:  x  :=  mid({sj[r]  :  |sj|  >  r})  ;  /*  Note  that  this  set  is  not  empty  */ 

8:  phase  :=  phase  +  1  ; 

fi; 

end  repeat 
end; 


Figure  2:  Slow  wait-free  n-process  approximate  agreement — Code  for  process  i. 
set  in  Line  7. 

We  now  present  the  correctness  proof  for  this  algorithm.  Since  the  only  shared  data  struc¬ 
ture  used  by  the  algorithm  is  the  atomic  snapshot  object  S ,  an  execution  of  the  algorithm  can 
be  viewed  as  a  sequence  of  primitive  atomic  operations  that  are  updates  and  scans  of  S.  Let 
a  be  any  execution,  and  let  r  >  1  be  a  phase  number. 

For  any  process  j  €  {0, . . . ,  n  —  1},  define  S*[r]  to  be  the  value  written  by  pj  to  5j[r]  in 
a  (X,  if  there  is  no  such  value).  Note  that  this  value  is  uniquely  defined.  Define  5a[r]  to  be 
{5j»[r]  ^  JL  :  j  €  {0, . . . ,  n  —  1}}.  The  following  is  immediate: 

Lemma  3.2  Let  a  be  an  execution  and  a'  is  a  finite  prefix  of  a.  Then  5®'[r]  C  5®[r],  for 
every  r  >  1. 

Throughout  the  proofs  in  this  paper,  a  subscript  t  for  a  procedure  denotes  invocation  by 
process  p,;  similarly,  a  subscript  i  for  a  local  variable  name  denotes  the  copy  of  this  variable 
at  process  Pi.  A  process  p,  is  said  to  be  in  phase  r  if  phase,  =  r.  Denote  by  scan?  the  scan 
performed  by  pi  at  phase  r,  and  by  update?(x)  the  update  by  p,  at  phase  r.  Note  that,  for 
r  >  2,  the  scan  performed  before  writing  a  suggestion  for  phase  r  is  denoted  scan1'-1 . 

For  a  finite  or  infinite  execution  a  and  r  >  1,  denote 

mids(a,  r)  =  {mid( 5®  [r]) :  a/  is  a  prefix  of  a  and  5#/[r]  is  nonempty}  , 
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that  is,  the  set  of  midpoints  of  all  the  sets  of  suggestions  for  phase  r  at  earlier  points  of  a. 
The  next  lemma  is  the  key  for  proving  that  the  algorithm  is  wait-free.  It  will  be  used  later, 
in  Corollary  3.7,  to  show  that  the  range  of  suggestions  decreases  by  a  constant  factor  with 
each  phase.  Intuitively,  it  states  that  any  suggestion  for  phase  r  must  be  in  the  range  of  the 
midpoints  of  all  the  sets  of  suggestions  for  phase  r  -  1  at  earlier  points  in  the  execution. 

Lemma  3.3  For  any  finite  execution  a  and  phase  r  >  2,  ran5e(S“[r])  C  range(mids(a,  r-1)). 

Proof:  By  induction  on  the  length  of  the  execution.  The  basis  holds  vacuously. 

For. the  inductive  step,  the  interesting  case  is  when  a  ends  with  update[(x),  for  some  t, 
where  x  —  [r].  Then  scan?-1  appears  in  a.  Let  a'  be  the  shortest  prefix  of  a  that  includes 

scan?-1.  Note  that  o'  is  a  proper  prefix  of  a. 

Let  r'  be  the  largest  phase  number  read  in  scan?-1.  Since  process  p<  reads  its  own  sequence, 
r'  >  r  —  1.  If  r'  =  r  -  1,  then  the  code  implies  that  x  is  the  midpoint  of  5“‘[r  -  1],  which 
suffices.  If  r'  >  r  then,  by  the  code,  x  =  mid(50,'[r])-  By  the  induction  hypothesis  on  a', 
ranje(50,,[r])  C  range(mids(ot'  ,r  —  1)).  Thus, 

x  =  mid(50,  [r])  €  nm5e(5a,[r])  C  range(mids(a' ,  r  —  1))  C  range(mids(a,r  —  1))  , 

as  needed.  ■ 

Since  range{mids(a,r  —  1))  C  ran</e(5a[r  -  1]),  we  have: 

Corollary  3.4  For  any  finite  execution  a  and  phase  r  >  2,  rangfe(5“[r])  C  range(Sa[r  -  1]). 

For  the  rest  of  the  proof,  we  fix  some  infinite  execution  (3  of  the  algorithm.  The  following 
lemmas  are  stated  with  respect  to  fi.  The  following  is  a  corollary  of  Lemma  3.3. 

Corollary  3.5  For  any  phase  r  >2,  range{S^[r])  C  range(mids(/3 ,  r  -  1)). 

The  next  lemma  states  tha*  the  diameter  of  all  the  possible  midpoints  of  the  suggestions 
in  phase  r  is  at  most  half  the  diameter  of  all  the  suggestions  for  phase  r. 

Lemma  3.6  For  any  phase  r  >  1,  diam(mids(0,r))  <  ^diom(5^[r]). 

Proof:  If  mids(fi,r)  is  empty  then  diam(mids((3,  r))  =  0  and  the  claim  follows  immediately, 
so  assume  that  mids(/3 ,  r)  is  nonempty.  Let  o'  and  a"  be  two  prefixes  of  (3  such  that  50,'[r] 
and  5aW[r]  are  nonempty.  It  suffices  to  show  that  |mid(5a"[r])  -  mid(5*,[r])|  <  ^dtam(5^[r]). 
Without  loss  of  generality,  suppose  a."  is  a  prefix  of  a'.  By  Lemma  3.2,  50,M[r]  Q  Sa  lr)  £ 
$0[r].  Suppose  first  that  mid(S0,,[r])  <  mid(5°"[r]).  Thus,  mid(S0,,[r])  <  mid(S0,"[r])  < 
max( 5“"[r])  <  max( 5®'[r]).  Hence 

|mid(5°”[r])  -  mid(5®'[r])|  <  i<Kam(Sa'[r])  <  idiom(5^[r])  , 

as  needed.  A  symmetric  argument  applies  if  mid(5®”[r])  >  mid(5°'[r]).  ■ 
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The  following  lemma  guarantees  that  suggestions  will  become  closer  with  each  phase;  it 
will  be  used  together  with  Lemma  3.9  to  ensure  wait-freedom. 

Lemma  3.7  For  any  phase  r  >2,  dt'am(5/3[r])  <  ^d:'am(5^[r  -  1]) 

Proof:  By  Corollary  3.5,  ranpe{5^[r))  C  range(mids(0, r  -  1)).  Thus, 

diam(Sl3[T\)  <  diam(mids(f3,r  -  1)) 

<  \ diam(S0[r  -  1])  by  Lemma  3.6. 


Lemma  3.8  If  some  process  returns  x  in  phase  r  and  y  €  S^[r],  then  y  e  ne(x). 

Proof:  Assume  pi  returns  x  in  phase  r,  and  assume,  by  way  of  contradiction,  that  there  exist 
processes  with  suggestions  for  phase  r  that  are  not  in  ne(x).  Let  pj  be  one  of  these  processes 
with  the  property  that  scan’'-1  is  the  earliest  among  scanr-1  of  these  processes;  let  a  be  the 
shortest  prefix  of  (3  that  includes  scan?-1.  Let  y  =  S^[r];  by  assumption,  y  £  nc[x]. 

By  the  way  pj  was  chosen,  there  is  no  update?,^),  with  yf  £  n*(x)  in  a;  thus,  ranye(5“[r])  C 
nt[x\.  Let  r'  be  the  maximum  phase  number  read  in  scan?-1.  It  must  be  that  r/  <  r  —  1,  since 
otherwise,  Pj  would  have  set  its  suggestion  for  phase  r  to  be  in  ne(x).  Since  process  pj  reads 
its  own  sequence,  r'  =  r  -  1. 

The  fact  that  r1  =  r- 1  also  implies  that  scan?-1  precedes  update? (x).  Let  o'  be  the  shortest 
prefix  of  /3  that  includes  scan?.  Since  update?(x)  precedes  scan?,  it  follows  that  scan?-1  precedes 
scan?,  i.e.,  a  is  a  prefix  of  a'. 

Since  process  p,-  returns  in  phase  r,  it  follows  from  the  code  that  ranye(S“'[r  -  1])  C  n,[x]. 
Since  r  - 1  is  the  maximum  phase  number  read  in  scan?-1,  it  follows  that  y  =  mid(5“[r  - 1])  £ 
ronye(S0,[r  -  1]).  However,  by  Lemma  3.2,  S°[r  —  1]  C  S“'[r  -  1],  and  thus  y  €  n«(*),  a 
contradiction.  ■ 

Lemma  3.9  For  any  phase  r  >  1,  if  diam(S^[rJ)  <  e,  then  every  nonfaulty  process  returns 
no  later  than  phase  r  +  1. 

Proof:  From  the  code  of  the  algorithm  it  follows  that  every  nonfaulty  process  either  returns 
or  reaches  phase  r+1.  If  diam(5^[r])  <  e  it  follows  from  Corollary  3.4  that  dtam(5^[r+l])  <  e. 

The  proof  proceeds  by  induction  on  the  order  in  which  processes  perform  scanr+1.  For  the 
base  case,  let  p,-  be  the  first  process  to  perform  scanr+1.  Clearly,  pi  has  phase,-  =  r  +  1  = 
max-phase ,  and  by  assumption  r  +  1  >  2.  Also,  diam(5^[r])  and  dtam(S^[r  +  1])  are  less  than 
or  equal  to  e,  and  thus,  p<  will  pass  the  test  in  Line  5  and  will  return  in  phase  r  +  1.  The 
induction  step  is  similar,  and  uses  the  fact  that  so  far  no  process  has  advanced  beyond  phase 
r  +  1  to  show  that  any  process  that  reaches  phase  r  +  1  passes  the  test  in  Line  5  and  returns 
in  phase  r  +  1.  ■ 
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Thus  we  have  proved: 


Theorem  3.10  Procedure  wait-free-approx  is  a  wait-free  algorithm  for  the  approximate  agree¬ 
ment  problem  whose  running  time  on  input  (xo,  .  •  . ,xn-i)  is  at  most 

Q(n 2  log(  diam((x°j.:  •  •  ’  *«-*» ))  . 

Proof:  The  validity  condition  clearly  holds,  since  processes  decide  only  on  their  suggestions 
and  these  are  always  within  the  range  of  the  inputs  (Corollary  3.4). 

To  show  agreement,  assume  that  r  is  the  minimum  phase  in  which  some  process  returns,  and 
let  pi  be  a  processes  that  returns  *  in  phase  r.  By  Lemma  3.8,  the  suggestions  of  all  processes 
for  phase  r  are  in  nt(x).  By  Corollary  3.4,  the  same  is  true  for  phase  r  + 1.  By  Lemma  3.9,  all 
nonfaulty  processes  return  no  later  than  phase  r  +  1,  and  thus,  all  nonfaulty  processes  return 
either  in  phase  r  or  in  phase  r  +  1.  Since  processes  return  only  their  suggestions,  all  returned 
values  are  in  ne(x),  as  needed. 

Since  the  diameter  of  suggestions  decreases  by  a  factor  of  two  with  each  phase  (by  Lemma  3.7) 
it  will  eventually  be  smaller  than  £  and,  by  Lemma  3.9,  each  process  will  eventually  decide. 
This  guarantees  wait-freedom. 

To  show  the  time  bound,  notice  that,  by  Lemma  3.7,  after  O(log(  ^iaro({Io^-  :,gprj})  ))  phases, 
processes  will  have  very  close  suggestions;  by  Lemma  3.9,  all  processes  will  return.  The  time 
it  takes  a  process  to  execute  each  phase  is  bounded  from  above  by  the  number  of  operations  it 
executes.  Using  the  implementation  of  atomic  snapshots  from  [1],  this  is  bounded  by  0(n2).  m 

Since  the  input  range  is  not  bounded  and  £  may  be  arbitrarily  small,  the  running  time 
of  the  algorithm  as  a  function  of  n  is  actually  unbounded.  Note  that  the  time  complexity 
in  the  execution  where  processes  operate  synchronously  starting  with  inputs  (xo, . . .,x„_i)  i6 
(l(n  iog(ii«2lifgp*a=lIl)).T 

4  The  Bias  Function 

The  algorithms  in  Sections  5  and  6  return  a  decision  value  by  performing  a  calculation  based 
on  an  input  value  and  a  corresponding  counter  for  each  process.  We  name  the  calculated 
function  bias,  as  the  returned  decision  value  is  biased  towards  (i.e.  is  closer  to)  the  input  value 
associated  with  the  process  having  the  largest  corresponding  counter.  Before  presenting  the 
algorithms,  we  present  the  function  and  explain  its  properties.  The  proofs  of  these  properties 

TTh*  discrepancy  between  thin  bound  and  the  bound  in  the  theorem  in  due  to  the  fact  that  tighter  bounds 
have  not  been  proven  for  the  time  to  execute  operations  in  the  implementation  of  atomic  snapshot  objects  of 
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function  bias(v° 
begin 

1:  i£v° 

2: 

3: 

fi; 

end; 


,c®,c*,£); 

=  t>1=  0  then  return  0 

else  if  c°  <  c1  then  return  v1  +  -  min {c*£,  In1!}) 

else  return  v°  +  -  min{c°£,  |»°|}) 


Figure  3:  The  bias  function — Code  for  process  p;. 

are  purely  arithmetic,  involving  no  arguments  about  synchronization  between  processes,  and 
have  therefore  been  deferred  to  Section  9. 

In  order  to  understand  the  nature  of  the  calculation  performed  by  the  bias  function,  we 
briefly  explain  the  structure  of  the  algorithms  using  it.  The  new  algorithms  are  conceptually 
based  on  the  following  high-level  two-process  algorithm.  A  process  pi  (similarly  po),  knowing 
only  its  own  input  value  v1,  will  repeatedly  take  incremental  steps  of  size  £,  starting  at  0  and 
ending  upon  reaching  the  value  v1,  unless  it  reads  that  the  other  process  po  has  also  moved. 
In  the  former  case  it  decides  on  w1,  and  in  the  latter  case  its  decision  value  is  a  function  of  the 
relative  number  of  incremental  steps  both  processes  managed  to  take  before  each  noticed  the 
other  had  moved.  However,  since  in  either  case  process  pi’s  decision  must  be  guaranteed  to  be 
in  range({»°,t?1}),  it  cannot  just  be  a  value  in  the  interval  nmpe({0,t>1}).  This  is  the  exact 
purpose  of  the  function  bias.  It  provides  a  mapping  from  the  processes’  incremental  walks  in 
the  intervals  range  ({Q,t;0})  and  range({0,  v1})  respectively,  to  walks  of  proportional  length 
in  the  allowed  range ({t?°,  t;1}).  The  code  of  bias  appears  in  Figure  3.  The  function  takes  as 
inputs  two  real  number  values  v°  and  v1,  two  associated  counters,  c°  and  c1  (integers  denoting 
the  number  of  incremental  steps  each  process  po  or  p\  took),  and  e. 

An  example  of  the  translation  defined  by  bias  is  given  in  Figure  4  for  the  case  0  <  v°  <  V1. 
Assume  po  traversed  a  distance  of  length  c°  •  e  away  from  0  towards  t>°,  and  p\  a  distance 
of  length  c:  •  e  away  from  0  towards  v1.  The  bias  function  maps  the  respective  distances  of 
length  c°  •£  and  c1  • e  (within  the  interval  [~v°,  v1]),  into  distances  of  proportional  length  in  the 
interval  [t>°,  w1].  The  starting  point  0  in  [— v°,  v1],  is  replaced  by  the  point  neto-0  in  [t>°,  v1].  The 
returned  decision  value  is  then  the  point  associated  with  the  larger  counter  (larger  traversed 
distance). 

We  now  introduce  several  lemmas  that  formally  outline  the  properties  of  the  bias  function 
and  on  which  the  correctness  proofs  of  the  algorithms  in  the  sequel  will  be  based.  The  first  is  a 
rather  simple  statement,  namely,  that  the  returned  value  of  any  call  to  bias  is  in  range({t>°,  v1}). 

Lemma  4.1  Let  c°,  c1  be  nonnegative  integers,  and  v°,  u1,  e  be  real  numbers,  with  e  >  0.  Then 
bias^.uSc^c^e)  €  ronge({v°,t;1}). 
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Figure  4:  The  bias  mapping. 


The  next  three  lemmas  have  to  do  with  an  additional  property  required  of  the  bias  finction: 
that  the  values  returned  by  different  calls  to  bias  always  be  approximately  the  same,  even 
if  the  counter  parameter  values  or  the  real  parameter  values  used  in  these  calls,  are  slightly 
different.  The  following  first  lemma  states  that  applying  bias  to  counters  c°  and  c1  that  are 
only  approximately  the  same,  yet  with  exactly  the  same  real  numbers  v°,  v1  and  e,  results  in 
returned  values  that  are  approximately  the  same. 

Lemma  4.2  Let  c°,cx  be  nonnegative  integers,  and  v°,  v1,e,m  be  real  numbers,  e  >  0,  m  >  0. 

(1)  Suppose  c1  >  c°  and  |ux|/£  -  m  <  c1.  Then  |bias(t>0,t>l,c0,cl,e)  -  vx|  <  me. 

(2)  Suppose  c°  >  c1  and  |v°|/e  -  m  <  c°.  Then  |bias(v°, o'jC0,^,*)  —  »°|  <  me. 

The  next  lemma  shows  that  the  results  of  two  calls  to  bias  with  “close”  (in  a  sense  made 
precise  by  the  lemma)  values  for  c°,cx,  and  the  same  w°,  vl,e,  are  “close”. 

Lemma  4.3  Let  c§,c$,  c?,cj  be  nonnegative  integers,  and  v°,v1,e,m  be  real  numbers,  e  >  0 
and  m  >  0.  Suppose  min{c§,c&}  =  min{c?,c}}  =  0  and  |c{|  -  c?|  +  |cj  -  c||  <  m.  Then 

|bias(o0,t>1,cg,4»€)-bias(t>0,t>1,c?,c},e)|  <  me  . 

The  last  lemma  in  ths  section  states  that  applying  bias,  this  time  to  real  numbers  t>°  and 
u1  that  are  approximately  (to  within  a  6  factor)  the  same,  yet  with  exactly  the  same  counters 
c°,cx  and  e,  results  in  values  that  are  approximately  the  same. 
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Lemma  4.4  Let  c°,cl  be  nonnegative  integers,  and  »§,  v£,v?,  v},£,6  be  real  numbers,  with 
e  >  0,  6  >  0.  Suppose  |»§  -  <  6  and  |t$  -  t>}|  <  6.  Then 

|bias(«o ,  t»o ,  c°,  c1 ,  e)  —  bias(  v? ,  «J ,  c°,  c1 ,  e)|  <  6 6  . 


5  Fast  2-Process  Approximate  Agreement 

We  now  show  that,  for  two  processes,  there  exists  an  approximate  agreement  algorithm  whose 
time  complexity  is  constant;  i.e.,  it  does  not  depend  on  the  Tange  of  input  values  or  e.  The 
n-process  algorithm  presented  in  Section  6,  when  specialized  to  the  case  n  =  2,  also  yields  a 
(somewhat  larger)  constant  time  complexity.  We  present  this  algorithm  because  we  believe  its 
simplicity  will  help  the  reader  develop  an  intuition  for  the  ideas  that  will  be  later  used  in  the 
general  algorithm. 

5.1  Informal  Description 

The  key  ideas  underlying  this  algorithm  are  as  follows.  A  process,  p,,  running  on  its  own, 
can  assume  that  either  it  is  running  very  fast  (and  not  much  time  has  elapsed),  or  the  other 
process,  pr,  has  failed.  Thus,  p,  may  take  an  unlimited  number  of  steps  without  degrading 
the  time  complexity  for  failure-free  executions,  as  long  as  pr  does  not  perform  any  steps.  Of 
course,  if  pr  does  not  take  any  steps  at  all,  then,  in  order  to  guarantee  the  wait-free  property, 
Pi  must  eventually  deride  (unilaterally)  on  its  own  value.  In  this  case,  in  order  to  guarantee 
correctness,  it  is  necessary  that  if  and  when  pr  does  appear,  it  must  be  able  to  know,  just  by 
reading  p,’s  registers,  what  p,-  has  decided.  However,  an  inherent  difficulty  of  programming 
asynchronous  systems  is  that,  due  to  the  uncertainty  of  interleaving,  at  least  one  process  pi 
has  an  “uncertainty  of  one  step,”  namely,  it  cannot  tell  whether  pr  read  the  value  written  in 
p,-,8  latest  write  or  the  value  written  in  p,-’s  preceding  write.  A  two-process  solution  that  halves 
the  distance  between  the  suggested  values  is  thus  of  no  use,  since  the  “uncertainty  of  one  step” 
can  cause  processes  do  decide  on  values  that  are  more  that  e  apart.  Our  solution  is  to  have 
a  process  change  its  suggestions  gradually  with  each  step,  more  precisely,  by  an  amount  less 
than  e,  so  that  the  “uncertainty  of  one  step”  will  result  only  in  e  inaccuracy  in  the  derision 
value. 


5.2  The  Algorithm 

The  code  for  process  p,  is  given  in  Figure  5.  Each  process  p,,  *  €  {0, 1}  maintains  a  single- 
writer  multi-reader  atomic  register  with  two  fields:  V, — the  input  value,  a  real  number,  and 
C, — the  counter,  an  integer.  Each  process  starts  by  writing  its  input  and  initializing  a  counter 
in  the  shared  memory  (Line  1  in  increase-counter).  It  then  keeps  incrementing  this  counter 
until  either  it  has  taken  a  number  of  steps  proportional  to  the  absolute  value  of  its  input,  or 
the  other  process  has  taken  a  step,  whichever  happens  first  (Line  2  of  increase-counter).  When 
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function  fast-2-approx(x,£); 

1:  increase-counter(x,  Jjl)  ; 

2:  <t»°,*1,c»,c»):=(Vo,V1,Co,C,>; 

3:  if  c‘  =  1  then  return  w* 

4:  else  return  bias  (t>°,  v1,*:0,  c1^); 

end; 

function  increase-counter  (v,  max); 

1:  (VuCi)  :=  (v,0)  ; 

2:  while  CV  =  1  and  C,-  <  max  do  C,-  :=  Ci  +  1  od; 

end; 


Figure  5:  Fast  wait-free  2-process  approximate  agreement — Code  for  process  p<. 

the  process  stops,  it  collects  all  the  C  and  V  values  and  applies  the  function  bias  to  get  a 
decision  value.  As  described  in  the  former  section,  the  decision  is  within  the  input  range  and 
biased  towards  the  input  value  of  the  process  with  the  larger  counter.  In  particular,  if  a  process 
runs  to  completion  without  observing  the  other  process,  it  decides  on  its  own  input  value.  We 
show  that  the  discrepancy  in  the  reading  of  the  counters  among  the  two  processes  is  at  most 
1,  and  thus,  based  on  the  properties  of  the  bias  function,  the  decisions  based  on  the  values  of 
the  counter  will  differ  by  at  most  £. 

5.3  Correctness  Proof 

An  execution  of  the  algorithm  can  be  viewed  as  a  sequence  of  primitive  atomic  operations  that 
are  reads  and  writes  of  atomic  registers  (and  may  include  changing  local  data).  Fix  some 
execution  a  of  the  algorithm.  All  lemmas  in  the  rest  of  this  section  are  stated  with  respect 
to  a.  The  next  lemma  shows  a  crucial  property  of  the  values  of  the  counters  used  by  the  two 
processes.  In  this  lemma  X  is  treated  as  -1. 

Lemma  5.1  Assume  po  and  pi  return  from  fast-2-approx.  Let  i  €  {0, 1},  and  let  a  and  cr  be 
the  values  ofCi  read  by  pi  and  pr,  respectively,  in  Line  t  c/fast-2-approx.  Then,  c,  - 1  <  cr  <  c,-. 

Proof:  Since  pi  returns,  it  must  be  that  p,  writes  to  C,.  Let  x,  be  the  last  write  by  p,  to  Ci 
in  a.  Since  increase-counter  returns  after  the  last  write  to  Ci  and  p,-  is  the  only  one  to  modify 
Ci,  it  follows  that  c,-  is  the  value  written  to  Ci  in  Let  fa  be  the  read  by  pr  of  C,  in  Line  2 
of  fast-2-approx.  Note  that  cr  is  the  value  returned  in  fa.  Since  C,-  is  atomic,  it  is  clear  that 
cr  <  c,.  We  now  show  that  c,-  —  1  <  cr. 

If  ^  =  0  then  since  cT  <  q,  cr  €  {1,0};  since  1  is  mapped  to  -1,  the  claim  follows.  So 
assume  Ci  >  0.  Let  x-  be  the  penultimate  write  by  p,  to  C,,  writing  c,  -  1.  Let  fa  be  the  latest 
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read  of  Cx  by  p,  that  precedes  note  that  precedes  It  must  be  that  the  value  read  in 
fa  Is  X.  Let  Tx  be  the  write  of  0  by  px  to  Cx  in  a.  From  the  code,  it  follows  that  irr  precedes  fa. 
Since  the  value  read  in  fa  is  X,  it  follows  from  the  atomicity  of  Cr,  that  fa  precedes  Tf.  Thus, 
ir|  precedes  fa.  From  the  atomicity  of  C,  it  follows  that  -  1  <  Cf.  ■ 

We  can  now  prove  that  the  algorithm  satisfies  the  agreement  property: 

Lemma  5.2  Z/fast-2-approxo  returns  yo  and  fast-2-approx,  returns  yi  then  |yo  -  Vil  <e. 

Proof:  The  proof  of  this  lemma  is  separated  into  two  cases.  In  one  case,  we  apply  Lemma  4.2. 
In  the  other  case,  we  show  that  the  sum  of  the  differences  between  the  values  of  c°  and  c1  used 
by  po  and  by  p\  is  at  most  1,  and  appeal  to  Lemma  4.3.  The  details  follow. 

Denote  by  the  first  write  by  p,  to  Cj,  writing  0,  for  *  €  (0, 1}.  Since  both  processes 
decide,  both  *o  and  *i  must  appear  in  a.  Assume,  without  loss  of  generality,  that  ir0  precedes 
T\.  (The  other  case  is  symmetric.)  Assume  that  process  po  reads  (eg,  t>g,  eg,  eg)  in  Line  2  before 
deciding,  and  that  process  pi  reads  (v?,  vj,c?,c})  in  Line  2  before  deciding.  Note  that,  since 
Pi  first  writes  0  to  C,  and  then  reads  C,-,  it  must  be  that  c{  >  0,  for  t  6  {0, 1}. 

Let  4>  be  any  read  of  Co  by  pi,  returning  some  value  z.  The  code  of  the  algorithm  implies 
that  Tj  precedes  fa  Since  *o  precedes  *i,  ir0  precedes  fa  By  the  atomicity  of  Co,  this  implies 
that  z  >  0.  This  implies,  in  particular  that  c?  >  0,  and  thus,  fast-2-approxj  returns  in  Line  4. 
In  addition,  this  also  implies  that  pi  will  not  increase  Ci  beyond  0,  and  thus,  by  the  atomicity 
of  Ci,  cJsO  and  eg  €  {X,0}.  We  separate  the  rest  of  the  proof  into  two  cases: 

C«M  1:  eg  =  X.  In  this  case,  fast-2-approxo  returns  v§  =  x0  in  Line  3.  The  code  of  increase- 
counter  implies  that  |x0|/£  <  eg.  Lemma  5.1  implies  that  |xo|/e  - 1  <  c?.  Also,  vf  =  x0-  Since 
c^  >  0  =  c| ,  we  can  apply  Lemma  4.2(2)  with  m  =  1  and  get  that  |bias(t;f,  t>J,  c?,  c},£)-»g|  <  e, 
as  needed. 

Case  2:  eg  =  0.  Thai  fast-2-appro^  returns  in  Line  4  and  vg  =  t>}.  We  have  that  min  {eg,  eg)  = 
eg  =  0  and  min{c?,  c}}  =  e\  =  0.  Also,  |«g  -  c?|  +  |cg  -  c}|  =  |cg  -  c?|  <  1,  by  Lemma  5.1.  The 
claim  follows  by  applying  Lemma  4.3  with  m  =  1.  a 

We  have: 

Theorem  5.3  Procedure  fast-2-approx  is  a  wait-free  algorithm  for  the  S-process  approximate 
agreement  problem  whose  time  complexity  is  0(1). 

Proof:  Agreement  follows  from  Lemma  5.2.  It  follows  from  the  code  and  from  Lemma  4.1 
that  the  values  returned  are  in  the  range  of  the  original  input  values;  hence  the  validity 
property  is  satisfied.  Each  process  p,  executes  at  most  0(|x,|/e)  steps  before  deciding;  thus, 
the  algorithm  is  wait-free.  Since  each  process  executes  a  constant  number  of  (its  own)  steps 
after  the  other  process  performs  its  first  step,  the  time  complexity  of  this  algorithm  is  0(1).  ■ 
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6  Fast  n-Process  Approximate  Agreement 


In  this  section,  we  present  a  fast  (0(log  n)  time)  wait-free  approximate  agreement  algorithm 
for  n  processes.  The  algorithm  is  based  on  an  altemated-interleaving  method  of  integrating 
wait-free  (resilient  but  slow)  and  non-wait-free  (fast  but  not  resilient)  algorithms  to  obtain  new 
algorithms  that  are  both  resilent  and  fast. 

We  begin  by  showing  how  one  can  reduce,  in  constant  time,  the  problem  of  n-process 
approximate  agreement  with  arbitrary  input  values  to  a  special  case  of  the  problem  where  the 
set  of  input  values  is  included  in  the  union  of  two  small  intervals.  We  do  this  by  performing 
an  altemated-interleaving  of  a  wait-free  and  a  non-wait-free  algorithm.  We  then  show,  again 
based  an  altemated-interleaving  of  wait-free  and  non-wait-free  algorithms,  that  n  processes 
with  values  in  two  small  intervals  can  “simulate,”  in  0( logn)  time,  two  virtual  processes 
running  the  fast  approximate  agreement  algorithm  of  Section  5,  thus  solving  the  approximate 
agreement  problem  for  n  processes  and  any  two  values.  Combining  the  two  algorithms  yields 
an  O(log  n)  wait-free  approximate  agreement  algorithm. 

a 

The  second  part  of  the  algorithm  relies  on  procedures  for  synchronization  and  input  col¬ 
lection  with  O(log  n)  time  complexity.  These  procedures  are  presented  in  Section  6.3. 


6.1  Informal  Description 

The  first  part  of  the  algorithm — the  one  that  achieves  the  constant-time  reduction  to  two  small 
intervals,  is  encapsulated  in  procedure  n-to-2  (Figure  6).  The  idea  is  simple:  interleave  the 
execution  of  the  slow  wait-free-approx  procedure  with  that  of  the  fast  wait-approoc.  The  resulting 
algorithm  is  wait-free  since  even  if  n  -  1  processes  fail,  wait-free-approx  will  terminate.  It  takes 
at  most  0(1)  time  in  the  failure-free  execution  since  wait-approx  terminates  within  0(1)  time. 
However,  some  processes  (group  a)  might  finish  the  alternated  execution  with  a  value  from 
wait-approx,  while  others  (group  b)  finish  with  a  value  from  wait-free-approx.  We  thus  did  not 
solve  the  approximate  agreement  problem,  but  we  did  guarantee  that  the  values  are  included 
in  the  union  of  two  small  intervals.  The  procedure  returns  an  output  value  v,  and  a  group 
gi  €  {a,  6}  to  which  pi  is  said  to  belong.  It  is  guaranteed  that  output  values  for  processes  in 
the  same  group  g,  €  {<>,6}  are  at  most  e/12  apart. 

The  second  part  of  the  algorithm  solves  n-process  approximate  agreement  in  O(log  n)  time, 
assuming  that  processes  are  partitioned  into  two  groups  with  approximately  the  same  value  in 
each  group.  The  solution  is  based  on  having  the  processes  in  group  a  (resp.  b)  jointly  simulate 
a  virtual  process  po  (resp.  pi)  that  execute  the  function  fast-2-approx  of  Figure  5. 

The  following  straightforward  simulation  is  expressed  by  Lines  1-2  of  the  function  increase- 
counter  in  Figure  6.  The  counter  Co  of  fast-2-approx  is  replaced  by  a  joint  counter,  which  is 
defined  to  be  the  sum  of  local  counters  Ci,  for  all  i  in  group  o.  Each  step  of  the  simulated 
counter  Co  is  implemented  by  0(n)  steps  of  the  joint  counter  for  a.  Each  step  of  this  joint 
counter  is,  in  turn,  implemented  by  a  single  step  of  one  of  the  individual  counters  in  group  a. 


17 


Similarly,  the  processes  in  group  b  simulate  counter  C\  of  fast- 2- approx.  In  Line  2  of  increase- 
counter,  in  order  to  decide  on  the  values  of  the  joint  counters  of  a  and  6,  a  process  reads  the 
values  of  all  local  counters.  If  the  counter  simulated  by  p,-’s  group  is  not  large  enough  and  the 
counter  simulated  by  the  other  group  is  J.,  then  pi  advances  the  counter  simulated  by  its  group 
(by  incrementing  its  local  counter  Cj),  and  repeats.  Otherwise,  p,-  exits  increase-counter. 

One  can  see  that,  in  an  execution  where  processes  operate  synchronously,  each  iteration  of 
the  while  loop  in  Line  2  of  increase-counter  has  0(n)  time  complexity  since  reading  all  memory 
locations  to  calculate  the  simulated  counter  takes  0(n)  steps.  However,  one  can  improve  the 
time  complexity  based  on  the  following  observation.  If  p,  ever  detects  that  all  processes  have 
set  their  counters  (in  Line  1  of  increase-counter),  then  it  knows  that  one  of  the  following  holds: 
either  some  process  from  the  other  group  has  set  its  local  counter  (and  hence  that  group’s 
simulated  counter),  to  a  value  other  than  1,  or  the  other  group  is  empty.  In  the  former  case, 
the  loop  predicate  in  Line  2  must  be  true,  while  in  the  latter  case,  the  final  value  for  the 
other  group’s  counter  will  be  ±.  In  either  case,  p,-  can  stop  executing  increase-counter,  and  be 
guaranteed  to  correctly  simulate  the  behavior  of  the  2-process  algorithm.  In  order  to  detect 
in  less  than  O(n)  time  that  all  processes  have  set  their  counters,  we  use  an  O(logn)  non-wait- 
free  synch  procedure,  described  in  Section  6.3,  whose  termination  ensures  this  mndition.  To 
achieve  the  better  time,  the  algorithm  alternates  synch  with  the  (wait-free)  loop  in  Line  2  of 
increase-counter. 

The  delicate  synchronization  provided  by  synch  and  its  effect  on  the  rest  of  the  algorithm 
guarantee  that  after  some  process  exits  increase-counter,  individual  counter  values  increase  at 
most  by  3.  Thus,  after  exiting  increase-counter,  a  process  can  perform  an  O(logn)  wait-free 
fast-collect,  described  in  Section  6.3,  in  order  to  collect  all  the  values  needed  to  decide  on  the 
returned  value  in  Lines  3-4.  The  above  property  ensures  that  the  simulated  counter  values 
used  by  different  processes  do  not  differ  much. 

6.2  The  Algorithm 

The  code  for  the  algorithm  is  presented  in  Figure  6.  Alternated  procedures  are  enclosed 
within  begin-alternate  and  end-alternate  brackets.  This  construct  means  that  the  algo¬ 
rithm  alternates  strictly  between  executing  single  steps  of  the  two  alternated  procedures,  and 
terminates  the  first  time  one  of  the  procedures  terminates.8  When  an  alternation  is  used  in 
an  assignment  statement,  the  value  assigned  is  the  value  returned  by  the  procedure  that  ter¬ 
minates  first.  The  algorithm  uses  the  bias  procedure  of  Figure  3.  In  addition  to  the  shared 
data  structures  used  by  wait-free- approx  and  wait-apprax,  process  p,,t  €  (0,...,n  -  1},  has 
a  single-writer  multi-reader  atomic  register  with  the  following  fields:  Vi — the  value  returned 
in  pi’s  first  phase;  G, — denoting  the  group  to  which  pi  belongs;  — p,’s  contribution  to  its 
group’s  counter;  — p,’s  boolean  synch  termination  flag. 

*  We  remark  that  this  is  just  a  coding  convenience,  used  to  amplify  the  control  structure  of  the  algorithm. 
It  is  implemented  locally  at  one  process  and  does  not  cause  spawning  of  new  proccmes. 
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In  the  code  we  abuse  notation  and  denote  by  V3 ,  where  g  is  a  group’s  name,  the  “group’s 
value”  calculated  as  follows:  if  g  =  <7;  then  it  is  V,,  and  if  g  ^  <7,-  then  it  is  an  arbitrary  Vj  such 
that  pj  is  in  group  g  if  there  is  any,  and  J.,  otherwise.  The  value  v3  is  calculated  in  a  similar 
manner  from  the  corresponding  local  copies.  (Recall  our  convention  that  lower  case  letters 
stand  for  local  variables  and  upper  case  letters  for  shared  variables.)  When  g  is  a  group  name, 
9  denotes  the  other  group’s  name,  e.g.,  if  g  =  a  then  g  =  b.  The  notation  C3,  for  g  €  {a, 6}, 
stands  for  the  sum  of  those  C;  such  that  G,  =  g  and  C,-  ^  J.,  if  there  is  any  such  C,-,  and  i., 
otherwise.  The  value  c3  is  calculated  in  a  similar  manner  from  the  corresponding  local  copies. 


6.3  Fast  Information  Collection  and  Synchronization 

We  now  present  the  procedures  for  information  collection  and  synchronization  and  prove  their 
properties.  We  start  with  a  wait-free  algorithm  for  input  collection — returning  the  current 
values  in  the  entries  of  an  array  R.  The  time  complexity  of  the  algorithm  is  0(log  n). 

This  problem  is  interesting  on  its  own  as  it  underlies  any  problem  of  computing  a  function, 
e.g.,  max  or  sum,  on  a  set  of  initial  values  that  reside  in  the  shared  memory.9  Once  a  process 
collects  all  the  values,  computing  the  function  can  be  done  locally  in  constant  time.  Since 
fi(logn)  is  a  lower  bound  on  the  time  for  the  information  collection  problem  (see,  e.g.,  [11]), 
this  implies  that  for  problems  whose  output  depends  on  all  the  initial  values  in  memory,  and 
only  on  them,  there  exists  an  optimally  fast  wait-free  solution. 

Our  algorithm,  presented  in  Figure  7,  is  a  wait-free  variation  of  the  pointer-jumping  tech¬ 
nique  used  in  PRAM  algorithms  (e.g.,  [49]).  For  sequences  R,  R'  and  a  nonnegative  integer  n 
we  define  concatenate  (R,R')  as  returning  the  concatenation  of  R'  to  R,  and  truncate(/2,  n)  as 
returning  the  first  n  elements  of  R  if  |i?|  >  n,  and  R,  otherwise.  The  initial  value  JL  is  treated 
like  any  other  value  and  may  be  returned  by  the  algorithm  for  entries  that  have  not  yet  been 
set. 


Fix  some  execution  a  of  fast-n-approx  algorithm.  We  clearly  have: 

Theorem  6.1  Assume  fast-collect,-  is  invoked  by  p,-  in  a,  and  let  0!  be  the  shortest  prefix  of 
a  that  includes  an  invocation  of  fast-collect.  Then  fast-collect;  returns  a  vector  containing,  for 
each  pj,  a  value  that  appears  in  Rj  at  some  point  at  or  after  a!.  Moreover,  fast-collect;  returns 
within  at  most  2 n  steps  by  pi 

The  next  lemma  is  the  crux  of  the  time  analysis  for  this  algorithm. 

Let  t  be  the  time  of  the  last  event  in  the  shortest  finite  prefix  of  a  that  includes  an  invocation 
of  fast-collect  by  every  p,-,  *  €  {0, . . . ,  n  -  1},  if  such  a  prefix  exists,  00  otherwise. 

'Note  that  these  problems  are  very  different  from  the  decision  problems  considered  until  now  in  this  paper, 
where  inputs  are  local  to  the  processes  and  do  not  reside  in  the  shared  memory. 
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^  w  to 


function  fast-n-approx(z,£); 
begin 

0:  (v,g)  :=  n-to-2(z,e)  ; 

1:  increase-counter(t>,p,  ^j)  ; 

(v,g,<?)  :=  fast-collect  (V,G,C); 
if  cs  =  1  then  return  va 

else  return  bias(t>a,t;fc,ca,cfc^/6n); 

end; 

function  n-to-2  (z,e); 
begin 

(v,g)  :=  begin-alternate 

1:  (wait-free-approx(x,£/12),a) 

and 

2:  (wait-approx  (.*),  b); 

end-alternate; 

3:  return  (v,g) 

end; 

function  increase-counter  (v,  g,  max); 
begin 

1:  ( Vi,Gi,Ci )  :=  (v,ff,0); 

begin-alternate 

2:  while  Cs  =  1  and  C9  <  max  do  C,  :=  C,  +  1  od; 

and 

3:  synch  (C); 

end-alternate; 

4:  T,  :=  true ; 

end; 

Figure  6:  Fast  wait-free  n-process  approximate  agreement — Code  for  process  p,\ 
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function  fast-coilect  (R); 
begin 

1:  /:=  1; 

2:  while  /  <  n  do 

3:  Ri  :=  concatenate  (A,,  A(t+/)  mod  n); 

4:  /  :=  |A,-|; 

od; 

5:  return  truncate(A,-,n); 

end; 


Figure  7:  Fast  wait-free  information  collection — Code  for  process  p<. 


Lemma  0.2  Assume  t  <  oo.  For  every  i  e  {0, . . . ,  n  —  1}  and  every  integer  r,  0  <  r  <  flog  n] , 
| A,- 1  >  min{2r,n}  at  time  t  +  2r. 

Proof:  The  proof  is  by  induction  on  r.  The  base  case,  r  =  0,  is  trivial. 

For  the  induction  step,  assume  that  r  >  1.  If  at  time  t  +  2(r  —  1),  |Aj|  >  n,  the  claim 
follows.  So  suppose,  |A,|  <  n  at  time  t  +  2(r  -  1).  Then  process  p,  reads  A(,+/.)modn  after 
time  t  +  2(r  -  1),  where  /,-  is  the  length  of  Ri  at  time  t  +  2(r  -  1).  By  the  inductive  hypothesis, 
|Ai|  >  2r“1  and  |A(j+/.)modn|  >  2r_l,  at  time  t  -f  2(r  -  1).  It  follows  that  |A,|  >  2T  at  time 
t  +  2r.  ■ 

In  particular,  at  time  t  +  2  flog  n],  we  have  |A,|  >  n  for  every  i.  Thus,  fast-collect,-  returns 
by  time  t  +  2  flog  n] .  We  have: 

Theorem  6.3  Let  a'  be  a  finite  prefix  of  a.  Assume  that  in  a',  fast-collect,-  is  invoked  by  pi, 
for  every  i  €  {0, . . . ,  n  -  1}.  Then  for  every  i  €  {0, . . . ,  n  -  1},  fast-collect,-  returns  within  at 
most  0( log  n)  time  after  a*. 

The  synchronization  procedure,  synch,  is  a  variant  of  fast-collect.  Since  it  is  used  within 
an  alternate  construct,  it  is  possible  that  synch  is  aborted  without  completing  and  returning 
wnormally.n  To  cope  with  this  possibility,  we  associate  with  the  shared  array  R  to  which  synch 
is  applied,  a  special  termination  array  T,  whose  entries  can  take  on  values  {i.,  true}.  T,  is  set 
to  true  if  p,  terminates,  i.e.,  aborts  or  returns  from  synch,.  The  synchronization  procedure 
guarantees  that  if  it  returns,  then  either  all  the  entries  of  the  array  are  non-±  values,  or  for 
some  j ,  Tj  =  true.  It  is  not  wait-free.  The  code  appears  in  Figure  8. 

Again,  we  fix  some  execution  a  of  fast-n-approx.  We  have: 
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/*  wait  */ 


procedure  synch(Jl); 
begin 

1:  repeat  until  R{  /  1; 

2:  /  :=  1; 

3:  while  1  <  n  and  Ti+j  moa  „  =  J.  do 

4:  repeat  until  Ri+imo dn  ^  -L;  /*  wait  */ 

5:  :=  concatenate  (Rt,  R{i+I)  mo<in); 

6:  /  :=  \Ri\; 

od; 

end; 

Figure  8:  Fast  non- wait-free  synchronization — Code  for  process  p,. 

Theorem  6.4  Let  a'  be  a  finite  prefix  of  a.  Assume  that  in  a'  all  R  entries  are  set  to  values 
/  ±,  and  that  synch,-  is  invoked  by  p<.  Then  synch;  returns  within  at  most  3n  steps  by  pi  after 
the  end  of  a'. 

Theorem  6.5  Let  a’  be  a  finite  prefix  of  a.  Assume  that  in  a',  synch,-  returns,  for  some  pi. 
Then  at  the  end  of  a'  either  all  R  entries  are  ^  1  or  Tj  =  true  for  some  j. 

Let  a!  be  a  finite  prefix  of  a.  Note  that,  in  fast-n-approx,  if  p,  terminates  synch;,  i.e.,  aborts 
or  returns,  then  within  one  time  unit,  71,  =  true.  This  is  crucial  in  the  proof  of  the  next 
theorem. 

Theorem  6.6  Let  a1  be  a  finite  prefix  of  a.  Assume  that  in  a'  all  R  entries  are  set  to  values 
jz  X,  and  synch,-  is  invoked  by  pi,  for  every  i  €  {0, . . . ,  n  -  1}.  Then  every  process  terminates 
synch  within  at  most  O(logn)  time  after  the  end  of  a1. 

Proof:  Let  /  be  the  time  of  the  last  event  of  o'.  We  prove  that  for  every  process  p;  and 
for  every  integer  r,  0  <  r  <  2  flog  n],  by  time  t  +  3r,  either  p,-  terminates  synch,-  or  |fi;|  > 
min{2lr/2J,n}.  The  claim  follows  by  taking  r  =  2 flog  n]  and  noticing  that  if  \R%\  >  n,  then  p,- 
returns  from  synch;  within  0(1)  time. 

The  proof  is  by  induction  on  r.  The  base  cases,  r  =  0, 1,  are  trivial. 

For  the  induction  step,  assume  that  1  <  r  <  2  flog  n] .  If  p;  terminates  by  time  t  +  3r, 
then  the  claim  is  immediate.  So,  assume  p;  does  not  terminate  by  time  t  4-  3r.  In  par¬ 
ticular,  it  does  not  terminate  by  time  t  +  3(r  -  1).  Hence,  by  the  induction  hypothesis, 
|/?,-|  >  min{2Mr-1)/2J,n}  =  2Wr-1)/2J.  Then  process  p,  reads  m0(jn  after  time  f+3(r- 1), 
where  /;  is  the  length  of  Ri  at  time  t  +  3(r  -  1). 
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If  P(i+li) modn  terminates  by  time  t  +  3(r  -  1)  -  1,  then,  by  assumption,  r(,-+/i)modn  =  true 
by  time  t  +  3(r  -  1)  and  thus,  p;  terminates  by  time  t  +  3 r.  It  follows  from  the  induction 
hypothesis  for  r  -  2  that  |  #(;+/,)  mod  nl  >  2Kr-2)/2J.  Then  the  length  of  R,  at  time  t  +  3r  is 
larger  than  or  equal  to  2tlr_1)/2J  -f-  2^r-2l/2J  >  21+Kr-2)/2J  =  2lr/2J.  ■ 


6.4  Correctness  Proof 

We  remind  the  reader  that  an  execution  of  the  algorithm  is  viewed  as  a  sequence  of  primitive 
atomic  operations  that  are  reads  and  writes  of  atomic  registers.  We  now  fix  some  execution  a 
of  fast-n-approx. 

As  in  the  proof  of  the  2-process  algorithm  (Section  5),  the  crucial  point  in  the  proof  of  the 
algorithm  is  showing  that,  in  Lines  3-4  of  fast-n-approx,  processes  use  “close”  values  for  ca  and 
cb.  We  show  that  the  value  of  an  arbitrary  counter  when  some  process  invokes  fast-collect  are 
at  most  3  less  than  the  maximal  value  this  counter  ever  attains.  This  is  formalized  and  proved 
in  the  next  lemma: 

Lemma  0.7  Assume  that  p ,-  invokes  fast  -  collect,  in  a.  Fix  some  process  pj;  let  k  be  the 
value  of  Cj  returned  by  fast-collect,-.  Let  k'  be  the  maximum  value  written  to  Cj  in  a.  Then 
k'  -  3  <  k  < 

Proof:  The  inequality  k  <  k'  follows  immediately  from  the  atomicity  of  the  shared  register. 
To  prove  the  other  inequality,  let  p;/  be  the  first  process  to  execute  the  write  operation  in 
Line  4  of  increase-counter.  Such  a  process  exists  because  p,-  performs  this  write  operation 
before  invoking  fast-collect;.  Let  a#  be  a  shortest  prefix  of  a  that  includes  p,<’s  write  to  T,-». 
Let  k"  be  the  value  of  C}  at  the  end  of  a'.  Since  any  invocation  of  fast-collect  follows  this  last 
write  operation  in  Line  4,  Theorem  6.1  and  the  atomicity  of  Cj  implies  that  k"  <  k.  Thus, 
it  suffices  to  show  that  V  -  3  <  k".  There  are  two  cases  according  to  the  way  p;  exits  the 
alternate  construct  in  Lines  2-3  of  increase- counter: 

Case  1:  p;<  exits  the  while  loop.  It  must  be  that  one  of  the  halting  conditions  of  the  while 
loop  is  false  for  p,*.  If  p;>  and  p;  are  in  the  same  group,  i.e.,  =  9j ,  then  pj  will  perform 

at  most  one  iteration  of  the  while  loop  before  Pj  also  sees  the  corresponding  condition  to  be 
false.  If  pji  and  pj  are  not  in  the  same  group,  i.e.,  p;<  jt  g, ,  then  pj  will  perform  at  most  one 
iteration  of  the  while  loop  before  pj  sees  the  first  condition  to  be  false  (by  observing  C;»  ^  l). 
The  claim  follows. 

Case  2:  p,-»  returns  from  synch;.  It  follows  that  for  all  processes,  Tj  =  1  when  pi  terminates 
synch;.  It  follows  from  Theorem  6.5  that,  for  all  l  €  {0, ...,n  —  1},  the  value  of  Cj  at  the 
end  of  a'  is  £  X.  By  Theorem  6.4,  p,  will  exit  synch;(C)  after  performing  at  most  3n  of  its 
own  steps  after  a1.  It  follows  from  the  definition  of  alternate  that  pj  will  perform  at  most  3n 
steps  in  the  while  loop  in  Line  2  of  increase-counter,  before  synchj(C)  terminates.  However, 
each  iteration  of  the  while  loop  takes  at  least  n  steps  (since  n  registers  have  to  be  read). 
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Thus,  pj  will  perform  at  most  three  additional  iterations  of  the  while  loop,  before  synch;(C) 
terminates.  The  claim  follows.  ■ 

This  implies  that,  for  each  local  counter,  the  values  read  by  two  different  processes  differ 
at  most  by  3.  Hence,  the  values  used  by  different  processes  for  the  joint  counters  c°  and  c* 
differ  at  most  by  3n.  Formally,  we  have: 

tunnu  6.8  Suppose  i,j  €  {0, ...,n—  1}  and  g  €  {a, 6}.  Assume  the  values  returned  by 
fast-collect^  and  fast-collect^  calculate  to  cf  and  cf ,  respectively.  Then  |cf  -  cf  |  <  3n. 

We  can  now  prove  that  the  algorithm  satisfies  the  agreement  property: 

Lemma  0.9  //fast- approx,  returns  and  fast-approxj  returns  y:,  then  |y,-  -  pj\  <  e. 

Proof:  The  general  outline  of  the  proof  parallels  that  of  Lemma  5.2;  however,  some  of  the 
details  are  different.  First,  the  discrepancy  between  processes’  view  of  the  joint  counters  might 
be  3n;  to  compensate  for  that,  we  use  bias  with  e/6n.  In  addition,  we  must  allow  for  the 
possibility  of  using  different  values  from  the  same  group  (by  applying  Lemma  4.4).  The  details 
follow. 

We  present  the  proof  for  the  case  where  p,  and  Pj  are  not  in  the  same  group,  without  loss 
of  generality,  assume  gi  =  a  and  gj  =  b.  The  proof  for  the  case  where  Pi  and  pj  are  in  the  same 
group  follows  from  similar  arguments  and  is  left  to  the  reader. 

Assume  that  the  values  computed  by  pi  based  on  fast-collect,  to  be  used  in  Lines  3-4  of 
fast-n-approx  are  (v“,  vf,cf,cf);  similarly,  assume  that  the  values  computed  by  pj  based  on 
fast-collectj  to  be  used  in  Lines  3-4  of  fast-n-approx  are  (wf,  uj,cf,cf).  Note  that  since  pi  is  in 
group  a,  cf  >  0  and  v?  ^  1;  similarly,  since  pj  is  in  group  6,  cf  >  0  and  and  vb  ^  X. 

For  any  process  pk,  denote  by  the  write  by  process  p*  in  Line  1  of  increase-counter  (if  it 
appears  in  a).  Since  p,  and  pj  decide,  and  itj  must  appear  in  a.  Let  p,»  be  such  that 
is  the  first  in  a.  Assume,  without  loss  of  generality,  that  p,->  is  in  group  a.  Intuitively,  we 
assume  that  the  first  process  to  start  the  second  phase  of  the  algorithm  belongs  to  p/s  group, 
a. 

The  code  of  the  algorithm  implies  that  Tj>  precedes  any  calculation  of  Ca  by  pj>,  for  any 
py  in  group  b.  Since  precedes  it  follows  that  py  will  always  calculate  C8  /  X.  Thus, 
cj  >  0  and  hence  fast-n-approxj  returns  in  Line  4  and  v/  ^  X.  Also,  the  above  implies  that  Cb 
never  increases  beyond  0.  Thus,  cf  =  0  and  cf  €  {X,  0}.  We  separate  the  rest  of  the  proof  into 
two  cases: 

Case  1:  cf  =  X.  Then  fast-n-approx,  returns  vf  in  Line  3.  From  the  code  it  follows  that 
cf  >  |»*|6n/£.  By  Lemma  6.8,  c’  >  |w? |6n/s  -  3n.  Since  cj  >  0  »  cf ,  applying  Lemma  4.2  (2) 
with  m  =  3n  we  get  that 

|bias(wf ,  vf,  c®,  cf,  c/6n)  -  rf|  <  e/2  .  (1) 
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Also,  Theorem  3.1  implies  that  |vf  -  v*\  <  e/12.  Applying  Lemma  4.4  with  6  =  e/12,  c°  =  c“, 
c1  =  cbj,  vg  =  vf,  vl  =  vf,  =  vf,  v}  =  vf,  we  get  that 

| bias(  v? ,Vj,Cj,Cj,e/6n)  -  bias(vf,vf,cf,cf,e/6n)\  <  6e/12  =  e/2  .  (2) 

From  (1)  and  (2)  it  follows  that 

|  bias(  ,  v* ,  c“ ,  c* ,  £  /6n)  -  vf  |  <  £  , 

as  needed. 

Case  2:  cj  =  0.  Thus,  fast-n-approx^  returns  in  Line  4  and  vf  /  J..  We  have  that  min{c“,  cj}  = 
cj  =  0  and  min{cj,c*}  =  cf  =  0.  Also,  |c?  -  c*|  +  |c*  -  cj|  =  |cf  -  cf  \  <  3 n  by  Lemma  6.8. 
Applying  Lemma  4.3  with  m  =  3n  we  get 

|bia$(v*,  v*,  cf,  cf,e/6n)  -  bias(w£,»),c5,c5,£/6n)|  <  3n  £/6n  =  e/2  .  (3) 

Also,  Theorems  3.1  and  3.10  imply  that  j vf  -  t>*|  <  e/12  and  JvJ  —  v*|  <  e/12.  By  applying 
Lemma  4.4  with  6  =  e/12  we  get 

|bias(t>? ,  vf,  cf,  cf,  e/6 n)  -  bias(u*,  vj,  cf,  cf,  £/6n)|  <  6e/12  =  e/2  .  (4) 

From  (3)  and  (4)  it  follows  that 

|bias(w?,t>J,c?,c*,£/6n)  -  bias(t£,t>$,c?,c$,£/6n)|  <  e  , 

as  needed. 


We  have: 

Theorem  6.10  Procedure  fast-n-approx  is  a  wait-free  algorithm  for  the  n -process  approximate 
agreement  problem  whose  time  complexity  is  0(log  n). 

Proof:  Agreement  follows  from  Lemma  6.9.  Validity  follows  immediately  since  the  values 
returned  by  wait-free-approx  and  wait-approx  are  in  the  range  of  the  original  inputs,  and  the 
bias  function  preserves  this  property  (Lemma  4.1). 

The  algorithm  is  wait-free  because  the  first  alternative  of  each  alternation  construct  and 
fast-collect  are  wait-free. 

Within  0(1)  time  all  processes  finish  n-to-2.  Thus,  within  0(1)  time  all  processes  start 
procedure  increase-counter,  write  to  C,  and  invoke  synch.  By  Theorem  6.6,  within  0(log  n)  time 
each  process  terminates  synch.  Thus,  within  0( log  n)  time  all  processes  exit  increase-counter 
and  invoke  fast-collect.  By  Theorem  6.3,  all  processes  return  from  fast-collect  within  0( log  n) 
time.  Hence,  the  total  time  complexity  is  0( log  n).  ■ 
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7  A  logn  Time  Lower  Bound 


In  this  section,  we  show  that  the  log  n  dependency  exhibited  by  the  algorithm  of  Theorem  6.10  is 
inherent:  the  time  complexity  of  any  wait-free  algorithm  for  n-process  approximate  agreement 
is  at  least  log  n.  Together  with  Theorem  3.1,  this  result  shows  that  there  are  problems  for  which 
wait-free  algorithms  take  more  time  (by  an  ft(logn)  factor)  than  non-wait-free  algorithms. 

In  the  rest  of  this  section,  we  assume  that  each  process  has  only  one  register  to  which  it 
can  write.  Since  the  size  of  registers  is  not  restricted  and  since  only  one  process  may  write  to 
each  register,  there  is  no  loss  of  generality  in  this  assumption.  Let  Ri  be  the  register  to  which 
Pi  writes.  For  a  configuration  C  and  a  process  pi ,  let  st(p,-,C)  be  the  pair  consisting  of  the 
local  state  of  p,-  and  the  value  of  Ri  in  C,  i.e.,  st(pi,C )  =  ( state(pi,C ),  val(Ri,C)). 

The  synchronized  schedule  is  the  schedule  in  which  processes  take  steps  in  round-robin 
order  starting  with  po,  essentially  operating  synchronously.  The  sequence  of  r  rounds  in  the 
round-robin  order  is  denoted  oT.  For  any  configuration  C,  the  corresponding  synchronized 
execution  from  C  is  uniquely  determined  by  the  algorithm.  Note  that  this  is  a  O-admissible 
execution. 

We  now  define  the  set  of  processes  that  could  have  influenced  p<’ s  state  at  time  r  in  the 
synchronized  execution  from  a  configuration  C.  Let  C  be  a  configuration;  by  induction  on 
r  >  0,  define  the  set  INF{pi ,  r,  C),  for  every  *  €  {0, . . . ,  n  -  1},  using  the  following  rules: 

1.  r  =  0:  INF{pi ,  r,  C)  =  {p,},  for  every  *  €  {0, ...,»-  1}. 

2.  r  >  1:  if  p/s  rth  step  in  (C,  oT )  is  a  read  of  Rj,  then  INF(pi,  r,  C)  =  INF(pi ,  r  -  1,  C)  U 
INF(pj ,  r  - 1,  C).  If  p,’s  rth  step  is  a  write  (to  Ri)  then  INF(pi,  r,  C)  —  INF(pi,  r  - 1,  C). 

The  next  lemma  formalizes  the  intuition  that  INF  includes  all  the  processes  that  can  influence 
p’s  state  up  to  time  r. 

Lemma  7.1  Let  C\  and  C2  be  two  configurations,  let  pi  be  any  process  and  let  r  >  0.  If 
st(pj,Ci)  =  st(p,,Ci)  for  all pj  €  INF(pitr,Ci),  then  st\pi,Cior)  =  st(pi,C2oT). 

Proof:  The  proof  is  by  induction  on  r.  The  base  case,  r  =  0,  is  trivial  since  in  this  case, 
oq  —  A,  INF(j>i,0,Co)  =  {p,}  and  the  claim  follows  from  the  assumptions. 

To  prove  the  induction  step,  assume  the  claim  holds  for  r  -  1.  If  pt-’s  rth  step  is  a  write 
then  the  claim  follows  immediately  from  the  induction  hypothesis  since  INF(pi,r  -  l,Cj)  = 
INF(Pi,r,Cy). 

If  pi’s  rth  step  is  a  read  from  Rj  then  INF(pj,r  -  1,C|)  C  INF(pi,r,C%).  The  in¬ 
duction  hypothesis  implies  that  st(pj,Ci<rr-i)  =  **(?;> Ca^r-i).  By  the  same  reasoning, 
st(pi,Ci<r,-\)  =  st(pi,C2<7T-i).  Thus,  st{pi,C\oT )  =  st(pi,C2<rr ),  as  needed.  ■ 
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We  can  now  prove: 


Theorem  7.2  Any  wait- free  algorithm  for  the  n-process  approximate  agreement  problem  has 
time  complexity  at  least  log  n. 


Proof:  Assume  that  A  is  a  wait-free  approximate  agreement  algorithm.  We  prove  a  slightly 
stronger  claim:  there  exists  a  O-admissible  execution  a  in  which  no  process  decides  before 
time  log  n.  Suppose,  by  way  of  contradiction,  that  in  all  O-admissible  executions  some  process 
decides  before  time  log  n. 

Fix  some  e  <  1.  Let  a  be  the  infinite  synchronized  schedule,  i.e.,  the  limit  of  or.  Consider 
the  execution  of  A  under  a  from  the  initial  configuration  Co  where  processes  start  with  inputs 
(0, . . .,0).  Let  t  be  the  time  associated  with  the  first  decision  event  in  (Co,<r);  without  loss  of 
generality,  let  po  be  the  process  associated  with  this  event.  By  assumption,  t  <  logn.  By  the 
validity  property,  po  must  decide  on  0  since  all  processes  start  with  0.  Note  that,  by  induction 
on  r,  \INF(pi,r,C)l  <  2r,  for  every  configuration  C,  r  >  0  and  i  €  {0, ...,n  —  1}.  Since 
t  <  logn  it  must  be  that  |/jVF(po>7\Co)|  <  2r  <  n.  Thus,  there  exists  some  process  that  is 
not  in  INF(po,t,Co);  without  loss  of  generality,  assume  pn_i  \INF(po,  T,  Co)|. 

Intuitively,  to  complete  the  proof,  we  create  an  alternative  execution  in  which  pn- i  “starts 
early”  with  input  1,  runs  on  its  own  and  thus  must  eventually  decide  on  1.  We  then  let 
the  rest  of  the  processes  execute  as  if  they  are  in  the  synchronized  execution  from  Co  and  use 
Lemma  7.1  to  show  that  process  po  still  decides  on  0,  which  is  a  contradiction  to  the  agreement 
property,  since  e  <  1. 

More  precisely,  apply  r,  an  infinite  schedule  consisting  of  steps  of  p„_i  only,  to  the  initial 
configuration  C2,  where  processes  start  with  inputs  (1, . . . ,  1).  The  resulting  execution  (Cj,  r) 
is  (n  -  1)- admissible,  and  thus,  since  A  (n  -  l)-solves  the  approximate  agreement  problem, 
and  since  pn- 1  is  nonfaulty  in  r,  there  exists  a  finite  prefix  r'  of  r  in  which  pn-i  decides.  By 
validity,  pn-i  decides  on  1.  Now  apply  r'  to  the  initial  configuration  Ci  where  all  processes 
but  Pn—i  start  with  input  0,  and  pn-i  starts  with  input  1.  By  induction  on  the  prefixes  of  r', 
it  follows  that  the  s<(pl,_i,Cjr/)  =s  Car').  Thus  pn-i  decides  on  1  in  C\t' .  Since  p„_i 

can  write  only  to  Rn-i,  it  follows  that  for  all  processes  p;  /  p„_ j,  st(pj,C\r')  =  st(pj,Co).  By 
Lemma  7.1,  state{pQyC\T,OT)  =  sfafe(po,Co<rr).  Thus,  po  decides  on  0  in  C\t'ot ,  and  Pn_i 
decides  1,  which  is  a  contradiction  to  agreement,  since  e  <  1.  ■ 


8  A  Tradeoff  Between  Work  and  Time 

We  now  consider  the  performance  of  wait-free  algorithms  when  failures  occur.  A  drawback 
of  the  fast  algorithms  we  have  presented  in  this  paper  is  that  if  a  failure  does  occur,  then 
the  remaining  processes  will  have  to  take  many  steps  before  halting.  We  show  that  this 
phenomenon  is  unavoidable.  Roughly  speaking,  we  prove  that  if  an  algorithm  terminates 
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in  a  small  number  of  steps  in  executions  where  failures  do  occur,  then  it  is  slow  in  normal 
executions.  In  the  rest  of  this  section  we  restrict  our  attention  to  the  2  processes  case. 

Recall  that  the  work  performed  by  an  algorithm  is  define  to  be  the  maximum,  over  all 
executions,  of  the  total  number  of  operations  performed  by  all  processes  before  deciding.  The 
lower  bound  presented  here  is  slightly  stronger — it  bounds  the  number  of  operations  a  single 
process  performs  before  deciding  when  running  on  its  own.  Clearly,  this  also  gives  a  lower 
bound  on  the  work. 

Let  k  >  1  be  an  integer.  An  algorithm  is  k -bounded  if  from  any  reachable  configuration,  a 
process  that  executes  k  consecutive  steps  on  its  own  must  decide.  Fix  a  k- bounded  wait-free 
algorithm  A  for  approximate  agreement;  all  definitions  and  lemma*  in  the  rest  of  this  section 
are  with  respect  to  A.  For  each  process  p,  and  a  configuration  C,  reachable  in  an  execution 
of  A,  define  pre/^C),  the  preference  of  p,  in  C,  to  be  the  value  on  which  p,  decides  in  the 
execution  fragment  starting  from  C  in  which  it  runs  alone  until  it  decides. 

A  finite  schedule  is  a  (dock  if  it  consists  of  a  positive  number  of  events  by  Po  followed  by 
one  event  by  pj,  or  vice  versa. 

Lemma  8.1  Let  a  be  a  finite  schedule,  and  let  Co  be  an  initial  configuration.  Let  C  —  Cqo. 
There  exists  a  finite  block  schedule  such  that 

\pref0(Co')  -  prefx{Co>)\  >  -j| pre/0(C)  -  pref^C) |  . 

Proof:  The  proof  considers  the  tree  of  all  execution  fragments  of  time  1  from  C.  A  case 
analysis,  according  to  the  types  of  steps  taken,  similar  to  the  oae  in  [33],  is  used  to  show 
that  it  cannot  be  that  all  the  pairs  of  preferences  associated  with  leaves  of  this  tree  are  close 
together.  The  details  follow. 

Let  ro  =  0*,  i.e.,  the  schedule  consisting  of  k  events  of  po.  Similarly,  let  rj  =  1*.  Let 

(C,r0)  =  C,Ci . Cfc,  and  (C,rj)  =  C,C{,...,C£.  For  any  ly  1  <  l  <  k,  denote  D\  —  Cjl, 

i.e.,  the  configuration  that  results  from  applying  an  event  of  pi  to  C\.  Similarly,  for  any  /, 
1  <  /  <  k,  denote  D\  =  C/1.  Denote  =  pref^Di ),  =  prefx{D\),  =  pre/0(D/)  and 
u{  =  pre/^DJ) 

Since  A  is  k- bounded,  it  must  be  that  po  decides  in  Ctq\  by  definition,  it  must  decide  on 
prcf0(C).  Similarly,  p\  decides  on  prefx(C)  in  Ct\.  We  show  that  for  all  /,  1  <  /  <  k,  either 
Vq  =  Vq+1  or  v[  =  There  are  four  cases,  depending  on  the  type  of  operation  taken  in  po’ s 
step  from  C|  to  Cj+i  and  in  pi’s  step  from  C/  to  Df. 

1.  po  writes  and  p\  writes:  commutativity  implies  that  =  t>£+l. 

2.  po  reads  and  pi  reads:  commutativity  implies  that  =  t^+1. 

3.  po  writes  and  pi  reads:  =  vjf1,  since  the  state  of  po  is  the  same  in  Z>/0  and  D;+i. 
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4.  po  reads  and  pi  writes:  v[  =  »{+1,  since  the  state  of  p\  is  the  same  in  D(l  and  D/+1. 

By  symmetric  arguments  we  can  show  that  for  all  /,  1  <  /  <  k}  either  ul0  =  ul0+1  or  u{  =  u(+l 

In  a  similar  manner  we  show  that  either  v|  =  u|  or  Vq  =  Uq,  by  case  analysis,  depending  on 
the  type  of  operation  taken  in  po’s  step  from  C  to  C\  and  in  pi’s  step  from  C  to  cj: 

1.  po  writes  and  pj  writes:  commutativity  implies  that  vl0  =  Vq+1. 

2.  po  reads  and  pi  reads:  commutativity  implies  that  vlQ  =  Vq+1. 

3.  po  writes  and  pi  reads:  vf,  =  Vq+1,  since  the  state  of  po  is  the  same  in  Di  and  D[. 

4.  po  reads  and  pi  writes:  v{  =  v(+l,  since  the  state  of  pi  is  the  same  in  D\  and  D{. 

Thus,  either  there  exists  some  /  such  that  |»£  -  v(|  >  ^|pre/0(C)  -  pre/1(C)|,  or  there  exists 
some  /  such  that, -  t»J|  >  ^|pre/0(C)  -  pre/1(C)|.  In  the  first  case,  the  claim  follows  by 
taking  o'  =  0*1,  in  the  second  case,  the  claim  follows  by  taking  o'  —  1*0.  ■ 

Note  that  the  validity  condition  implies  that  if  pC s  input  in  an  initial  configuration  C  is 
then  pre/j(C)  =  v,.  Starting  with  this  fact  and  applying  Lemma  8.1  iteratively,  we  can  bound 
the  rate  at  which  a  Jb-bounded  algorithm  converges.  We  get: 

Theorem  8.2  Let  A  be  a  k-bounded  wait-free  algorithm  for  approximate  agreement  between 
2  processes.  Then  there  exists  an  execution  of  A  where  processes  start  with  inputs  (x0,ri)  in 
which  the  time  complexity  is  at  least  ft(log2*  JSL=£ll). 

Proof:  Let  C  be  an  initial  configuration  in  which  processes  have  inputs  (io,  zi).  We  con¬ 
struct,  inductively,  a  scehdule  <7/  such  that  oi  is  a  sequence  of  /  blocks  and  for  Cj  =  Co i, 

I mMC,)  -  >  (i)'  IprcUC)  -  pn/,(C)l . 

This  is  done  by  applying  Lemma  8.1.  We  have  that  time(oi)  =  l,  since  o/  consists  of  l  blocks. 
The  validity  condition  implies  that  pre/j(C)  =  *<•  Thus,  |pre/0(C)-pre/1(C)|  =  |*0-*i|.  The 
claim  follows  by  noticing  that  it  cannot  be  that  both  po  and  pi  have  decided  in  a  configuration 
D  if  |pre/0(D)  -  pre/,(D) |  >  e.  ■ 

Remark  8.1  The  case  analysis  in  the  proof  of  Lemma  8.1  can  be  extended  to  handle  multi¬ 
writer  multi-reader  registers;  thus,  the  above  tradeoff  applies  also  to  algorithms  that  use  multi¬ 
writer  multi-reader  atomic  registers. 
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9  Properties  of  the  Bias  Function 


In  this  section  the  interested  reader  may  find  the  long  postponed  proofs  of  Lemma  4.1  through  4.4. 
We  begin  with  the  rather  straightforward  proof  of  Lemma  4.1. 

Lemma  4.1  Let  c°,  c1  be  nonnegative  integers,  and  v°,  tJ1,^  be  real  numbers,  with  e  >  0.  Then 
bias(i>°,  v1,c°,c,,e)  6  ronpe({v°,t>1}). 

Proof:  Let  y  =  bias(w°,i;1,c0,c1,£).  The  claim  is  trivial  if  y  is  calculated  in  Line  1.  Suppose 
y  is  calculated  in  Line  2.  (The  case  where  y  is  calculated  in  Line  3  is  symmetric.)  Then 
y  =  v1  +  |Jfr°|~f't>1i|(|«1|  -  min{c1£,  Jv1!})-  If  the  min  is  attained  in  the  second  term,  then  y  =  v1 

and  the  claim  follows.  So  assume  cxe  <  u1,  so  y  =  v1  +  -  cle).  Assume  v1  >  v°. 

(A  symmetric  argument  applies  when  v1  <  u°.)  Then  t>°  —  v1  <  0,  and  clearly  y  <  v1.  Since 

Ipi+fgrid^M  ~  cle)l  <  w1  -  v°,  it  follows  that  y  >  v°.  m 

The  following  is  the  proof  of  Lemma  4.2. 

Lemma  4.2  Let  c°,  c 1  be  nonnegative  integers,  and  v°,  vl,e,mbe  real  numbers,  e  >  0,  m  >  0. 

(1)  Suppose  c1  >  c°  and  \vx\/e  -  m  <  e1.  Then  |bias(t7°T  w1,  c°,  cx,e)  -  w1!  <  me. 

(2)  Suppose  c°  >  c 1  and  |»°|/£  -  m  <  c°.  Then  |b»as(v0,  tr1,  c°,  c1,  e)  -  u°|  <  me. 


Proof:  We  present  the  proof  only  for  (2),  the  proof  for  (1)  follows  from  symmetric  arguments. 
Let  y  =  bias(r°,  v1,  c°,  c1,  e).  If  y  is  calculated  in  Line  1  of  bias,  then  y  =  0  and  v°  =  0  and  the 
claim  follows.  Hence,  since  c°  >  c1  it  foUows  that  y  is  calculated  in  Line  3  of  bias,  i.e., 

V  =  v°  +  |>^  [°M(W°l  -  ™n{c°e,  |v°|»  . 

If  the  min  attains  its  value  in  the  second  term  then  y  =  v°,  and  the  claim  follows.  Otherwise, 
c°£  <  |t>°|;  thus, 


r1  —  t>° 


|y-*°i  =  »v0^|7irFlM-c0£)-t,°1 

=  iWrMl|t;0|'c0e| 

<  ||u°|  -  c°e|  =  |t>°|  -  c°e  <  me  , 
by  the  hypothesis  of  the  lemma. 
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Next  is  the  proof  of  Lemma  4.3. 


Lemma  4.3  Let  c§,c£,c?,c}  be  nonnegative  integers,  and  t>°,t;,,£,m  be  real  numbers,  e  >  0 
and  m  >  0.  Suppose  min{c3,c^}  =  min{c?,cj}  =  0  and  |c§  -  c?|  +  |c$  -  cj|  <  m.  Then 

|bias(»0,»l,cg,cj,£)  -  bias(»°,t>1,c?,cj,£)|  <  me  . 


Proof:  Let  yo  =  bias(v°,  vx,c§,c£,c),  and  yi  =  biasfu0,®1,^,^,^). 

If  t;0  =  t/1  =  0  then  both  yo  and  y\  are  calculated  in  Line  1  of  bias,  i.e.,  yo  =  yi  =  0  and 
the  claim  follows. 

Now  assume  yo  is  calculated  in  Line  2  of  bias,  while  yi  is  calculated  in  Line  3  of  bias 
(the  reverse  case  is  symmetric).  Thus,  c§  <  cj,  while  e{  <  c°.  Thus,  by  assumption,  c§  = 
c\  =  0.  Since  |cg  -  cj|  +  |cj  -  c}|  <  m,  it  follows  that  |c?|  +  |c$|  =  c?  4-  e$  <  m.  Thus, 
min{c?,  |®°|/e}  +  min{c$,  |®l|/e)  <  m.  So,  min{c^e,  |®°|)  +  min{cje,  |vx|}  <  me.  We  have 


yb=l,1+ ]^nri«iT(it,1*”min{‘:°£,it,1i» and  yi  ®v#+]^iTFTi^t,0*”miIl^e’it,0*»  ’ 

Thus, 

l>b  -  JTil  =  l»’  +  |^|~j’)|i|(l»1l  -  mio{cJt, |«'|})  -  »°  -  |Ji|^J'ci|(l"‘>l  -  min{cS£, |»°|})| 

-  i”1  - v°  +  „^|‘,|(i»01 + 1®1!) - l”‘l) + mi”{c?£’ l”0|,)l 


w1  - 1>° 


=  |vl|>  +  min{c?e,  |w°|>| 

<  |  min{c&c,  It;1]}  +  min{c?«,  |w°|}|  =  min{4e,  [w1!}  +  min{e?e,  |t>°|}  <  me  , 


as  needed. 

Now  assume  that  both  yo  and  y\  are  calculated  in  Line  2  of  bias  (the  case  where  yo  and  yt 
are  calculated  in  Line  3  of  bias  is  symmetric),  i.e., 


yo  =  v1  + 


v°  -  u1 


v°-v1 


l®°l 


-j-j^i|(|»1l-min{cSe,|t;1|})  and  yi  =  [pTjd^l  ~  minfe  I®1!})  • 


If  for  yo  the  min  is  attained  in  the  second  term,  then  c^e  >  |vx|,  and  yo  =  ®*;  since  |cj-c}|  <  m 
it  follows  that  c|  >  |o1|/e  -  m.  Because  ft  is  calculated  in  Line  2,  c?  <  cj  and  the  claim  follows 
from  Lemma  4.2  (1).  A  similar  argument  applies  if  for  ft  the  min  is  attained  in  the  second 
term.  So  assume  that  for  both  yo  and  ft  the  min  is  attained  in  the  first  term.  Thus, 


lldo-yil  =  |®*  + 


«°  -  vl 

l®°l  + 1®1! 


(I®1!  -  )  -  ®*  - 


v°-v1 

|®°|  + 1®1! 


(I®1!  -  c\e)  | 
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=  I 


_  |t>°  —  p1!  x  . 

~  |«°|  +  |®l|  1 

<  |(cj[e  -  cfc)]  =  £|c}  -  4|  <  me  , 


as  needed. 


■ 


In  the  proof  of  the  next  lemma  we  use  the  following  two  facts: 

Claim  9.1  If  x,y,x',yf  are  real  numbers,  and  for  some  6,  |x  -  x'\  <  6  and  |y  —  y'l  <  6,  then 


We  prove  this  claim  by  first  showing  that  I  **+ |  <  3d,  using  calculus,  ii««: 
handling  the  absolute  values  by  case  analysis. 


£*'l 


Claim  9.2  If  x,y,x',yf  are  real  numbers,  and  for  some  6,  \x  -  x'\  <  S  and  |y  —  y'J  <  6,  then 

We  prove  this  claim  by  straightforward  calculations  and  a  case  analysis.  Finally,  we  can 
now  prove  Lemma  4.4. 


Lemma  4.4  Let  c°,  c1  be  nonnegative  integers,  and  v§,  v&,  v°,  v},£,  6  be  real  numbers,  with 
e  >  0,  6  >  0.  Suppose  |v§  -  v? |  <  S  and  |v£  -  v}|  <  S.  Then 

Ibias^&t^c^cSe)  -  bia*(»i,t>},c0,c\£)|  <  6d  . 


Proof:  Let  go  =  bias(t#,®$,c°,cl,e),  and  yi  =  bias^?,  v{,c°, c*,e).  If  vg  =  v  j  =  0  then 
IHo  =  0.  Thus,  K|  <  6  and  |v||  <  6.  So  from  Lemma  4.1  it  follows  that  |ft|  <  6  and  the  claim 
follows.  The  case  wj  =  t>J  =  0  follows  from  symmetric  arguments.  So  assume  at  least  one  of 
v§,  »o  »  nonzero  and  similarly  for  at  least  one  of  ,  v}. 

Assume  that  c°  <  c1,  i.e.,  yo  and  y\  are  calculated  in  Line  2.  (The  other  case,  where  c1  <  c° 
and  vd  and  fi  are  calculated  in  Line  3,  is  symmetric.)  Then 


w,=^+i^+Ri(W|-min{e‘ 


y\v0\J> 


I  •'ll  T  |V|| 


First,  assume  the  min  for  po  is  attained  in  the  second  term;  then  yo  =  v$.  In  this  case,  if 
the  min  for  y\  is  also  attained  in  the  second  term,  then  y\  =  and  the  claim  follows.  On 
the  other  hand,  suppose  the  min  for  y\  is  attained  in  the  first  term.  Since  the  min  for  Po  is 
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attained  in  the  second  term,  cxe  >  |t$|  >  |u||  -  6.  Applying  Lemma  4.2  (1)  with  m  =  6/s,  we 
get  that  |j/i  -  <  6.  Since  |t>£  -  vj|  <  6,  we  have  |y0  -  yi|  <  26. 

Now  assume  that  in  both  cases  the  min  is  attained  in  the  first  term.  In  particular,  cxs  <  |vj| 
and  c*£  <  |«$|.  We  have, 


lifo-yil  = 


Un  ~  vi 


V?  -  V? 


< 

< 


W  +  W^i(l"5|-c,£)-“!-raTM(W|-c'£)l 

<+l4rra(W|-cl£,-ilTra(l''!|-c,£)l 

c  ,  |KI(«0-  4)  _  IWIK  ~  W)|  ,  I  Vq-Vq  1  _  1 

1  WI  +  WI  Kl  +  Ml  KI  +  MI  Ri+wl 

4<+c'£|KI  +  wrKI  +  K'll  ’  byClaim9-1' 

,  26 

4*  +  c  g-y:„ -TTT-T-O!  1-  nTT-nK  »  by  Claim  9.2, 


46  ■+  cl& 
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26 


nun(KI.KI) 


<46-1-  ch—  <  6 6  . 
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10  Discussion  and  Further  Research 

For  approximate  agreement,  the  answer  to  the  question  whether  wait-free  algorithms  are  fast 
is  not  binary,  rather  it  is  quantitative:  we  have  presented  a  relatively  fast,  O(logn)  time, 
wait-free  algorithm  for  n-process  approximate  agreement.  On  the  other  hand,  logn  is  a  lower 
bound  on  the  time  complexity  of  any  wait-free  approximate  agreement  algorithm,  and  there 
exists  an  0(1)  time  non-wait-free  algorithm. 

Using  the  emulators  of  [5],  our  algorithms  can  be  translated  into  algorithms  that  work 
in  message-passing  systems.  The  algorithms  have  the  same  time  complexity  (in  complete 
networks)  and  are  resilient  to  the  failure  of  a  majority  of  the  processes. 

There  are  many  ways  in  which  our  work  can  be  extended.  An  interesting  direction  is  to 
consider  the  impact  on  our  results  of  using  other  shared  memory  primitives.  For  example, 
if  powerful  Read-Modify-  Write  registers  are  used,  then  a  constant  time  wait-free  approximate 
agreement  algorithm  can  be  devised.  What  happens  if  multi-writer  multi-reader  registers  are 
used?  The  existence  of  faster  wait-free  algorithms  using  these  primitives  will  imply  a  lower 
bound  on  the  time  complexity  (in  normal  executions)  of  any  implementation  of  multi-writer 
registers  from  single-writer  registers. 
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Another  avenue  of  research  is  to  see  whether  the  techniques  presented  m  this  paper,  both 
for  algorithms  and  lower  bounds,  can  be  applied  to  other  problems.  We  believe,  for  example, 
that  the  0(1)  time  algorithm  for  2-process  approximate  agreement  can  be  generalized  to  any 
derision  problem  of  size  2,  using  the  characterization  result  of  [8].  It  is  interesting  to  explore 
whether  similar  results  can  be  proved  for  problems  that  require  repeated  coordination  (e.g., 
(•exclusion). 

Finally,  there  remains  the  fundamental  unanswered  question  raised  by  this  work:  Can  wait- 
free  (highly  resilient)  computation  be  performed  at  the  price  of  no  more  than  a  logarithmic 
slowdown?  Even  more  strongly,  are  there  0(log  n)  time  wait-free  algorithms  for  all  problems 
that  have  wait-free  solutions? 

Following  a  preliminary  version  of  our  work,  first  steps  were  made  towards  answering  this 
question  in  the  context  of  randomized  computation  [46].  Based  on  the  alternated-interieaving 
method  presented  in  Section  6.2,  it  is  shown  that  any  decision  problem  that  has  a  wait-free 
or  expected  wait-free10  solution  algorithm,  has  an  expected  wait-free  algorithm  with  the  same 
worst  case  time  complexity,  that  takes  only  O(logn)  expected  time11  in  fault-free  executions. 
However,  the  above  question  itself  is  still  far  from  being  answered. 
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10 Aa  expected  wait-free  algorithm  is  a  randomised  algorithm  that  is  only  expected,  rather  then  guaranteed, 
to  terminated  within  n  (site  number  of  steps. 

“This  is  optimal  by  a  straightforward  extension  of  our  lower  bound  to  the  case  of  randomised  computation 
(see  [4«]). 
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